Or_Ob- ov
IN THE UNITED STATES PATENT AND TRADEMARK OFFICE £
.<x>
0OCF\
In re Patent Application of
BEN SHEN, WEN LIU, STEVEN D.
CHRISTENSON and SCOTT STANDAGE
For: GENE CLUSTER FOR
PRODUCTION OF THE ENEDIYNE
ANTITUMOR ANTIBIOTIC C-1027
Patent Application
Assistant Commissioner for Patents
Washington, D.C. 20231
By Express Mail No: EL160743652US
Dated: January 5, 2000
PATENT APPLICATION TRANSMITTAL
Sir:
Transmitted herewith for filing is the patent application of inventor(s) Ben Shen, Wen
Liu, Steven D. Christenson and Scott Standage, for "GENE CLUSTER FOR PRODUCTION OF
THE ENEDIYNE ANTITUMOR ANTIBIOTIC C-1027." Enclosed are:
1. 64 pages of the specification, including 71 claims and an abstract.
2. 11 sheets of drawings.
3 . 79 pages of Sequence Listing.
4. An oath or declaration of the inventors (unsigned).
San Francisco, California
u
•i— i
The filing fee is being deferred at this time.
Dated: January 5, 2000.
Tom Hunter (Reg. No. 38,498)
MAJESTIC, PARSONS, SIEBERT & HSUE P.C.
Four Embarcadero Center, Suite 1 100
San Francisco, California 941 1 1-4106
Telephone: (415)248-5500
Atty. Docket: 2500.128US1 Facsimile: (415)362-5418
UCRef: 99-174-1
-2-
Docket No: 2500.128US1
Client Ref: 99-174-1
In the United States Patent and Trademark Office
U.S. Patent Application For
GENE CLUSTER FOR PRODUCTION OF THE ENEDIYNE
ANTITUMOR ANTIBIOTIC C-1027
Inventor(s): BEN SHEN, a citizen of the Peoples Republic of China, residing at
1842 Rushmore Lane, Davis, CA 95616, USA
WEN LIU, a citizen of the Peoples Republic of China, residing at the
Institute of Medicinal biotechnology, Tiantan, Beijing, 100005, China
STEVEN D. CHRISTENSON, a citizen of the United States of
America, residing at 1079 Monarch Lane, Davis, CA, 95616, USA
SCOTT STANDAGE, a citizen of the United Kingdom, residing at
63 Tudor Road, Bornet, Herts, EN5 5NW, U.K.
Assignee: The Regents of the University of California
Entity: Small Entity
Majestic, Parsons, Siebert & Hsue P.C.
Four Embarcadero Center, Suite 1100
San Francisco, CA 941 1 1-4106
Tel: 415 248-5500
Fax: 415 362-5418
GENE CLUSTER FOR PRODUCTION OF THE ENEDIYNE
ANTITUMOR ANTIBIOTIC C-1027
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims benefit under 35 U.S.C. §1 19 of provisional
5 application USSN 60/1 15,434, filed on January 6, 1999, which is herein incorporated by
reference in its entirety for all purposes.
STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY
SPONSORED RESEARCH AND DEVELOPMENT
This work was supported in part by a grant from the Cancer Research
10 Coordinating Committee, University of California, the National Institutes of Health grant
CA78747, and the Searle Scholars Program/The Chicago Community Trust. The
Government of the United States of America may have certain rights in this invention.
FIELD OF THE INVENTION
This invention relates to the field of enediyne antibiotics. In particular this
15 invention elucidates the gene cluster controlling the biosynthesis of the C-1027 enediyne.
BACKGROUND OF THE INVENTION
The enediyne antibiotics are currently the focus of intense research activity in
the fields of chemistry, biology, and medical sciences, because of their unique molecular
architecture, biological activities, and modes of actions (Doyle and Borders (1995) Enediyne
20 antibiotics as antitumor agents. Marcel-Dekker, New York, Thorson et al. (1999) Bioorg.
Chem., 27: 172-188). Since the unveiling of the structure of neocarzinostatin chromophore
(Edo et al. (1985) Tetrahedron Lett. 26: 331-340) in 1985, the enediyne family has grown
steadily. Thus far, there have been three basic groups within the enediyne antibiotic family:
(a) the calicheamicin/esperamicin type, which includes the calicheamicins, the esperamicins,
25 and namenamicin, (b) the dynemicin type, and (c) the chromoprotein type, consisting of an
apoprotein and an unstable enediyne chromophore. The latter group includes
neocarzinostatin, kedarcidin, C-1027 (Fig. 1), and maduropeptin, whose enediyne
chromophore structures have been established, as well as several others whose enediyne
chromophore structures are yet to be determined due to their instability (Thorson et al.
-1-
(1999) Bioorg. Chem., 27: 172-188). N1999A2, in contrast to the other chromoproteins,
exists as an enediyne chromophore alone despite the fact that its structure is very similar to
the other chromoprotein chromophore (Ando et a/.(1998) Tetra. Letts., 39: 6495-6480).
As a family, the enediyne antibiotics are the most potent, highly active
5 antitumor agents ever discovered. Some members are 1 000 times more potent than
adriamycin, one of the most effective, clinically used antitumor antibiotics (Zhen et al
(1989) J, Antibiot. 42: 1294-1298). All members of this family contain a unit consisting of
two acetylenic groups conjugated to a double bond or incipient double bond within a nine or
ten-membered ring; i.e., the enediyne core as exemplified by C-1027 in Fig. 1. As the
10 consequence of this structural feature, these compounds share a common mechanism of
action: the enediyne core undergoes an electronic rearrangement to form a transient
benzenoid diradical, which is positioned in the minor groove of DNA so as to damage DNA
by abstracting hydrogen atoms from deoxyriboses on both strands (Fig. 1). Reaction of the
resulting deoxyribose carbon-centered radicals with molecular oxygen initiates a process that
15 results in both single-strand and double-strand DNA cleavages (Doyle and Borders (1995)
Enediyne antibiotics as antitumor agents. Marcel-Dekker, New York; Ikemoton et al (1995)
Proc. Natl Acad. Set USA 92:10506-10510; Myers et al (1997) J. Am. Chem. Soc. 1 19:
2965-2972; Stassinopoulos et al (1996) Science 272: 1943-1946; Thorson et al (1999)
Bioorg. Chem., 27: 172-188; Xu et al. (1997) J. Am. Chem. Soc. 119: 1133-1134). This
20 novel mechanism of DNA damage has important implications for their application as potent
cancer chemo therapeutic agents (Doyle and Borders (1995) supra.; Sievers et al (1999)
Blood 93:3678-3684).
As an alternative to making structural analogs of microbial metabolites by
chemical synthesis, manipulations of genes governing secondary metabolism offer a
25 promising alternative allowing preparation of these compounds biosynthetically (Cane et al
(1998) Science 282: 63-68; Hutchinson and Fujii. (1995) Ann. Rev. Microbiol 49: 201-38;
Katz and Donadio (1993) Ann. Rev. Microbiol. 47: 875-912). The success of the latter
approach depends critically on the availability of novel genetic systems and on genes
encoding novel enzyme activities. The enediynes offer a distinct opportunity to study the
30 biosynthesis of their unique molecular scaffolds and the mechanism of self-resistance to
extremely cytotoxic natural products. Elucidation of these aspects provides access to
rational engineering of enediyne biosynthesis for novel drug leads and makes it possible to
construct enediyne overproducing strains by de-regulating the biosynthetic machinery. In
addition, elucidation of an enediyne gene cluster contributes to the general field of
combinatorial biosynthesis by expanding the repertoire of novel polyketide synthase (PKS)
and deoxysugar biosynthesis genes as well as other genes uniquely associated with enediyne
biosynthesis, leading to the making of novel enediynes via combinatorial biosynthesis.
5 SUMMARY OF THE INVENTION
This invention provides nucleic acid sequences and characterization of the
gene cluster responsible for the biosynthesis of the enediyne C-1027 (produced by
Streptomyces globisporus). In particular structural and functional characterization is
provided for the 50 open reading frames (ORFs) comprising this gene cluster. Thus, in one
10 embodiment, this invention provides an isolated nucleic acid comprising a nucleic acid
selected from the group consisting of a nucleic acid encoding any of C-1027 open reading
frames (ORFs) -7 through 42, excluding ORF 9 (cagA), a nucleic acid encoding a
polypeptide encoded by any of C-1027 open reading frames (ORFs) -7 through 42, excluding
ORF 9 (cagA); and a nucleic acid amplified by polymerase chain reaction (PCR) using
15 primer pairs that amplify any of C-1027 open reading frames (ORFs) -7 through 42,
excluding ORF 9 (cagA). In one embodiment, preferred nucleic acids comprise a nucleic
acid encoding at least two (more preferably at least three or more) open reading frames
(ORFs) selected from the group consisting of ORF-1 through ORF 42, excluding ORF 9
(cagA).
20 In another embodiment this invention provides an isolated nucleic acid
comprising a nucleic acid that specifically hybridizes under stringent conditions to an open
reading frame (ORF) of the C-1027 biosynthesis gene cluster, excluding ORF 9 (cagA), and
can substitute for the ORF to which it specifically hybridizes to direct the synthesis of an
enediyne. In certain embodiments this also includes nucleic acids that would stringently
25 hybridizes indicated above, but for, the degeneracy of the nucleic acid code. In other words,
if silent mutations could be made in the subject sequence so that it hybridizes to he indicated
sequence(s) under stringent conditions, it would be included in certain embodiments.
Particularly preferred nucleic acids comprises a nucleic acid that specifically hybridizes
under stringent conditions to a nucleic acid selected from the group consisting of ORF -7,
30 ORF -6, ORF -5, ORF -4, ORF -3, ORF -2, ORF -1, ORF 0, ORF 1, ORF 2, ORF 3, ORF 4,
ORF 5, ORF 6, ORF 7, ORF 8, ORF 10, ORF 11, ORF 12, ORF 13, ORF 14, ORF 15, ORF
16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23, ORF 24, ORF 25, ORF
26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 35, ORF
36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42. Particularly preferred
isolated nucleic acid comprises a nucleic acid selected from the group consisting of ORF -7,
ORF -6, ORF -5, ORF -4, ORF -3, ORF -2, ORF -1, ORF 0, ORF 1, ORF 2, ORF 3, ORF 4,
5 ORF 5, ORF 6, ORF 7, ORF 8, ORF 10, ORF 11, ORF 12, ORF 13, ORF 14, ORF 15, ORF
16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23, ORF 24, ORF 25, ORF
26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 35, ORF
36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42. The nucleic acid may
comprises a nucleic acid that is a single nucleotide polymorphism (SNP) of a nucleic acid
10 selected from the group consisting of ORF -7, ORF -6, ORF -5, ORF -4, ORF -3, ORF -2,
ORF -1, ORF 0, ORF 1, ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 81, ORF 1,
ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 8, ORF 10, ORF 1 1, ORF 12, ORF 13,
ORF 14, ORF 15, ORF 16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23,
ORF 24, ORF 25, ORF 26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33,
15 ORF 34, ORF 35, ORF 36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42.
This invention also provides an isolated gene cluster comprising open reading
frames encoding polypeptides sufficient to direct the assembly of a C-1027 enediyne or a C-
1027 enediyne analogue. The gene cluster may be present in a cell, more preferably in a
bacterial cell (e.g. Actinornycetes, Actinoplanetes, Actinomadura, Micromonospora, or
20 Streptomycetes). Particular preferred bacterial cells include, but are not limited to
Streptomyces globisporus, Streptomyces lividans, Streptomyces coelicolor, Micromonospora
echinospora spp. calichenisis, Actinomadura verrucosopora, Micromonospora chersina,
Streptomyces carzinostaticus, and Actinomycete L585-6. The gene cluster may contain one
or more open reading frames is operatively linked to a heterologous promoter (e.g. a
25 constitutive or an inducible promoter).
This invention also provides for an polypeptide encoded by any one or more
of the nucleic acids described herein.
Also provided are host cell(s) (e.g. eukaryotic cells or bacterial cells as
described herein) transformed with one or more of the expression vectors described herein.
30 Preferred host cells are transformed with an exogenous nucleic acid comprising a gene
cluster encoding polypeptides sufficient to direct the assembly of a C-1027 enediyne or a C-
1027 enediyne analogue. In certain embodiments, heterologous nucleic acid may comprise
only a portion of the gene cluster, but the cell will still be able to express an enediyne.
This invention also provides methods of chemically modifying a biological
molecule. The methods involve contacting a biological molecule that is a substrate for a
polypeptide encoded by a C-1027 biosynthesis gene cluster open reading frame, with a
polypeptide encoded by a C-1027 biosynthesis gene cluster open reading frame whereby the
5 polypeptide chemically modifies the biological molecule. In one preferred embodiment, the
polypeptide is an enzyme selected from the group consisting of a hydroxylase, a
homocysteine synthase, a dNDP-glucose dehydrogenase, a citrate carrier protein, a C-methyl
transferase, an N-methyl transferase, an aminotransferase, a CagA apoprotein, an NDP-
glucose synthase, an epimerase, an acyl transferase, a coenzyme F390 synthase, and
10 epoxidase hydrolase, an anthranilate synthase, a glycosyl transferase, a monooxygenase, a
type II condensation protein, an aminomutase, a type II adenylation protein, an O-methyl
transferase, a P-450 hydroxylase, an oxidoreductase, and a proline oxidase. In a preferred
embodiment the method involves contacting the biological molecule with at least two
(preferably at least three or more) different polypeptides encoded by C-1027 biosynthesis
15 gene cluster open reading frames. The contacting may be in a host cell (e.g. a eukaryotic cell
or a bacterial cell) or the contacting can be ex vivo. The biological molecule can be an
endogenous metabolite produced by said host cell or an exogenous supplied metabolite. In
preferred embodiments, the host cell is a bacterial cell or eukaryotic cell (e.g., a mammalian
cell, a yeast cell, a plant cell, a fungal cell, an insect cell, etc.). In certain preferred
20 embodiments, the host cell synthesizes sugars and glycosylates the biological molecule. In
other preferred embodiments, the host cell synthesizes deoxysugars. The method can further
involve contacting the biological molecule with a polyketide synthase or a non-ribosomal
polypeptide synthetase. The contacting can be in a cell (e.g., a bacterial cell) or ex vivo. In
one preferred embodiment the method comprises contacting the biological molecule with at
25 substantially all of the polypeptides encoded by C-1027 biosynthesis gene cluster open
reading frames and said method produces an enediyne or enediyne analogue. In another
preferred embodiment, the biological molecule is a fatty acid and the biological molecule is
contacted with a C-1027 orf polyeptide selected from the group consisting of an epoxide
hydrase, a monooxygenase, an iron-sulfer flavoprotein, a p-450 hydroxylase, an
30 oxidoreductase, and a proline oxidase. In certain embodiments, the biological molecule is a
fatty acid and said biological molecule is contacted with a plurality of C-1027 orf
polypeptides comprising an epoxide hydrase, a monooxygenase, an iron-sulfer flavoprotein,
a p-450 hydroxylase, an oxidoreductase, and a proline oxidase. In one especially preferred
embodiment ,the biological molecule is contacted with polypeptides encoded by ORF 17,
ORF20, ORF21, ORF29, ORF30, ORF32, ORF35, and ORF38, In another especially
preferred embodiment, the biological molecule is contacted with polypeptides encoded by
ORF 15, ORF 16, ORF 28, ORF3, ORF 14, and ORF 13, and, in certain embodiments, ORF
5 4 and ORF 3 as well.
In certain embodiments, the method may comprise contacting a sugar with
one or more C-1027 open reading frame polypeptides selected from the group consisting of a
dNDP-glucose synthase, a dNDP glucose dehydratase, an epimerase, an aminotransferase, a
C-methyltransferase, an N-methyltransferase, and a glycosyl transferase. Particularly
10 preferred variant of this method comprise contacting a dNDP-glucose with a plurality of C-
1027 open reading frame polypeptides comprising a dNDP-glucose synthase, a dNDP
glucose dehydratase, an epimerase, an aminotransferase, a C-methyltransferase, an N-
methyltransferase, and a glycosyl transferase.
In certain other embodiments, the method comprises contacting an amino acid
15 with one or one or more C-1027 open reading frame polypeptides selected from the group
consisting of a hydroxylase, an aminomutase, a type II NRPS condensation enzyme, a type II
NRPS adenylation enzyme, and a type II peptidyl carrier protein. These methods may
involve contacting an amino acid with a plurality of C-1027 open reading frame polypeptides
comprising a hydroxylase, a halogenase, an aminomutase, a type II NRPS condensation
20 enzyme, a type II NRPS adenylation enzyme, and a type II peptidyl carrier protein. In
particularly preferred embodiments, the amino acid is a tyrosine.
This invention also provides a method of synthesizing a chromaprotein type
enediyne core, said method comprising contacting a fatty acid with one or more C-1027 orf
polypeptides selected from the group consisting of an epoxide hydrase, a monooxygenase, an
25 iron-sulfer flavoprotein, a p-450 hydroxylase, an oxidoreductase, and a proline oxidase. In
preferred embodiments, the fatty acid may be contacted with a plurality of C-1027 orf
polypeptides comprising an epoxide hydrase, a monooxygenase, an iron-sulfer flavoprotein,
a p-450 hydroxylase, an oxidoreductase, and a proline oxidase. In particularly preferred
embodiments, the fatty acid is contacted with polypeptides encoded by ORF 17, ORF20,
30 ORF21, ORF29, ORF30, ORF32, ORF35, and ORF38.
In still yet another embodiment, this invention provides a method of
synthesizing a deoxysugar. This method involves contacting a sugar with one or more C-
1027 open reading frame polypeptides selected from the group consisting of a dNDP-glucose
synthase, a dNDP glucose dehydratase, an epimerase, an aminotransferase, a C-
methyl transferase, an N-methyltransferase, and a glycosyl transferase. In preferred
embodiments, this method involves contacting a dNDP-glucose with a plurality of C-1027
open reading frame polypeptides comprising a dNDP-glucose synthase, a dNDP glucose
5 dehydratase, an epimerase, an aminotransferase, a C-methyltransferase, an N-
methyltransferase, and a glycosyl transferase. In particularly preferred embodiments, the
dNDP-glucose is contacted with polypeptides encoded by ORF17, ORF20, ORF21, ORF29,
ORF30, ORP32, ORF35, and ORF38.
This invention also provides methods of synthesizing a beta amino acid by
10 contacting an amino acid with one or one or more C-1027 open reading frame polypeptides
selected from the group consisting of a hydroxylase, an aminomutase, a type II NRPS
condensation enzyme, a type II NRPS adenylation enzyme, and a type II peptidyl carrier
protein. The method preferably comprises contacting an amino acid with a plurality of C-
1027 open reading frame polypeptides comprising a hydroxylase, a halogenase, an
15 aminomutase, a type II NRPS condensation enzyme, a type II NRPS adenylation enzyme,
and a type II peptidyl carrier protein. Particularly preferred embodiments comprise
contacting the amino acid (e.g. tyrosine) with polypeptides encoded by ORF 4, ORF1 1,
ORF24, ORF23, ORF25, and ORF26.
Also provided are methods of synthesizing an enediyne or an enediyne
20 analogue. These methods involve culturing a cell (e.g. a eukaryotic cell or a bacterium)
comprising a recombinantly modified C-1027 gene cluster under conditions whereby said
cell expresses said enediyne or enediyne analogue; and recovering the enediyne or enediyne
analogue. In preferred embodiments, the gene cluster is present in a bacterium (e.g.,
Actinomycetes, Actinoplanetes, Actinomadura, Micromonospora^ or Streptomycetes).
25 Particularly preferred bacteria include, but are not limited to Streptomyces globisporus,
Streptomyces lividans, Streptomyces coelicolor, Micromonospora echinospora spp.
calichenisis, Actinomadura verrucosopora, Micromonospora chersina, Streptomyces
carzinostaticus, and Actinomycete L585-6. In another preferred embodiment, the gene
cluster is present in a eukaryotic cell (e.g. a mammalian cell, a yeast cell, a plant cell, a
30 fungal cell, an insect cell, etc.). The host cell can be one that synthesizes sugars and
glycosylates the enediyne or enediyne analogue. The host can be one that synthesizes
deoxy sugars.
-7-
This invention also provides a method of making a cell (e.g., a bacterial or
eukaryotic cell) resistant to an enediyne or an enediyne metabolite. This method involves
expressing in the cell one or more isolated C-1027 open reading frame nucleic acids that
encode a protein selected from the group consisting of a CagA apoprotein, a SgcB
5 transmembrane efflux protein, a transmembrane transport protein, a Na+/H+ transporter, an
ABC transport, a glycerol phosphate tranporter, and a UvrA-like protein. In preferred
embodiments, the isolated C-1027 open reading frame nucleic acids are selected from the
group consisting of ORF 9, ORF2, ORF 27, ORF 0, ORF 1 c-terminus, ORF 2, and ORF 1
N-terminus. Certain embodiments exclude cagA (ORF 9).
10 In one embodiment, this invention specifically excludes one or more of open
reading frames -7 through 42. In particular, in one embodiment this invention excludes cagA
(ORF 9), and/or sgcA (ORF 1), and/or sgcB (ORF 2).
DEFINITIONS
The terms "C-1027 open reading frame", and "C-1027 ORF" refer to an open
15 reading frame in the C-1027 biosynthesis gene cluster as isolated from Streptomyces
globisporus. The term also embraces the same open reading frames as present in other
enediyne-synthesizing organisms (e.g. other strains and/or species of Streptomyces,
Actinomyces, and the like). The term encompasses allelic variants and single nucleotide
polymorphisms (SNPs). In certain instances the C-1027 ORF is used synonymously with the
20 polypeptide encoded by the C-1027 ORF and may include conservative substitutions in that
polypeptide. The particular usage will be clear from context.
The terms "isolated" "purified" or "biologically pure" refer to material which
is substantially or essentially free from components which normally accompany it as found
in its native state. With respect to nucleic acids and/or polypeptides the term can refer to
25 nucleic acids or polypeptides that are no longer flanked by the sequences typically flanking
them in nature.
The terms "polypeptide", "peptide" and "protein" are used interchangeably
herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers
in which one or more amino acid residue is an artificial chemical analogue of a
30 corresponding naturally occurring amino acid, as well as to naturally occurring amino acid
polymers. The term also includes variants on the traditional peptide linkage joining the
amino acids making up the polypeptide.
The terms "nucleic acid" or "oligonucleotide" or grammatical equivalents
herein refer to at least two nucleotides covalently linked together. A nucleic acid of the
present invention is preferably single-stranded or double stranded and will generally contain
phosphodiester bonds, although in some cases, as outlined below, nucleic acid analogs are
5 included that may have alternate backbones, comprising, for example, phosphoramide
(Beaucage et al (1993) Tetrahedron 49:1925) and references therein; Letsinger (1970) J.
Org. Chem. 35:3800; Sprinzl et al. (1977) Eur. J. Biochem. 81: 579; Letsinger et al (1986)
Nucl Acids Res. 14: 3487; Sawai et al (1984) Chem. Lett. 805, Letsinger et al (1988) J. Am.
Chem. Soc. 110: 4470; and Pauwels et al (1986) Chemica Scripta 26: 141 9),
10 phosphorothioate (Mag et al. (1991) Nucleic Acids Res. 19: 1437; and U.S. Patent No.
5,644,048), phosphorodithioate (Briu et al (1989) J. Am. Chem. Soc. 1 1 1 :2321, O-
methylphophoroamidite linkages (see Eckstein, Oligonucleotides and Analogues: A
Practical Approach, Oxford University Press), and peptide nucleic acid backbones and
linkages (see Egholm (1992) J. Am. Chem. Soc. 1 14:1895; Meier et al (1992) Chem. Int. Ed.
15 Engl 31: 1008; Nielsen (1993) Nature, 365: 566; Carlsson et al (1996) Nature 380: 207).
Other analog nucleic acids include those with positive backbones (Denpcy et al (1995)
Proc. Natl Acad. Sci. USA 92: 6097; non-ionic backbones (U.S. Patent Nos. 5,386,023,
5,637,684, 5,602,240, 5,216,141 and 4,469,863; Angew. (1991) Chem. Ml Ed. English 30:
423; Letsinger et al (1988) J. Am. Chem. Soc. 1 10:4470; Letsinger et al (1994) Nucleoside
20 & Nucleotide 13: 1597; Chapters 2 and 3, ASC Symposium Series 580, "Carbohydrate
Modifications in Antisense Research", Ed. Y.S. Sanghui and P. Dan Cook; Mesmaeker et al
(1994), Bioorganic & Medicinal Chem. Lett. 4: 395; Jeffs et al (1994) J. Biomolecular NMR
34:17; Tetrahedron Lett. 37:743 (1996) and non-ribose backbones, including those described
in U.S. Patent Nos. 5,235,033 and 5,034,506, and Chapters 6 and 7, ASC Symposium Series
25 580, Carbohydrate Modifications in Antisense Research, Ed. Y.S. Sanghui and P. Dan Cook,
Nucleic acids containing one or more carbocyclic sugars are also included within the
definition of nucleic acids (see Jenkins et al (1995), Chem. Soc. Rev. pp 169- 176). Several
nucleic acid analogs are described in Rawls, C & E News June 2, 1997 page 35. These
modifications of the ribose-phosphate backbone may be done to facilitate the addition of
30 additional moieties such as labels, or to increase the stability and half-life of such molecules
in physiological environments.
The term "heterologous" as it relates to nucleic acid sequences such as coding
sequences and control sequences, denotes sequences that are not normally associated with a
region of a recombinant construct, and/or are not normally associated with a particular cell.
Thus, a "heterologous" region of a nucleic acid construct is an identifiable segment of
nucleic acid within or attached to another nucleic acid molecule that is not found in
association with the other molecule in nature. For example, a heterologous region of a
5 construct could include a coding sequence flanked by sequences not found in association
with the coding sequence in nature. Another example of a heterologous coding sequence is a
construct where the coding sequence itself is not found in nature (e.g., synthetic sequences
having codons different from the native gene). Similarly, a host cell transformed with a
construct which is not normally present in the host cell would be considered heterologous for
10 purposes of this invention.
A "coding sequence" or a sequence which "encodes" a particular polypeptide
(e.g. a PKS, an NRPS, etc.), is a nucleic acid sequence which is ultimately transcribed and/or
translated into that polypeptide in vitro and/or in vivo when placed under the control of
appropriate regulatory sequences. In certain embodiments, the boundaries of the coding
15 sequence are determined by a start codon at the 5 r (amino) terminus and a translation stop
codon at the 3 1 (carboxy) terminus. A coding sequence can include, but is not limited to,
cDNA from procaryotic or eucaryotic mRNA, genomic DNA sequences from procaryotic or
eucaryotic DNA, and even synthetic DNA sequences. In preferred embodiments, a
transcription termination sequence will usually be located 3' to the coding sequence.
20 Expression "control sequences" refers collectively to promoter sequences,
ribosome binding sites, polyadenylation signals, transcription termination sequences,
upstream regulatory domains, enhancers, and the like, which collectively provide for the
transcription and translation of a coding sequence in a host cell. Not all of these control
sequences need always be present in a recombinant vector so long as the desired gene is
25 capable of being transcribed and translated.
"Recombination" refers to the reassortment of sections of DNA or RNA
sequences between two DNA or RNA molecules. "Homologous recombination" occurs
between two DNA molecules which hybridize by virtue of homologous or complementary
nucleotide sequences present in each DNA molecule.
30 The terms "stringent conditions" or "hybridization under stringent conditions"
refers to conditions under which a probe will hybridize preferentially to its target
subsequence, and to a lesser extent to, or not at all to, other sequences. "Stringent
hybridization" and "stringent hybridization wash conditions" in the context of nucleic acid
-10-
hybridization experiments such as Southern and northern hybridizations are sequence
dependent, and are different under different environmental parameters. An extensive guide
to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in
Biochemistry and Molecular Biology—Hybridization with Nucleic Acid Probes part I chapter
5 2 Overview of principles of hybridization and the strategy of nucleic acid probe assays^
Elsevier, New York. Generally, highly stringent hybridization and wash conditions are
selected to be about 5°C lower than the thermal melting point (T m ) for the specific sequence
at a defined ionic strength and pH. The T m is the temperature (under defined ionic strength
and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Very
10 stringent conditions are selected to be equal to the T m for a particular probe.
An example of stringent hybridization conditions for hybridization of
complementary nucleic acids which have more than 100 complementary residues on a filter
in a Southern or northern blot is 50% formamide with 1 mg of heparin at 42°C, with the
hybridization being carried out overnight. An example of highly stringent wash conditions is
15 0.15 M NaCl at 72°C for about 15 minutes. An example of stringent wash conditions is a
0.2x SSC wash at 65°C for 15 minutes (see, Sambrook et al (1989) Molecular Cloning - A
Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor
Press, NY, for a description of SSC buffer). Often, a high stringency wash is preceded by a
low stringency wash to remove background probe signal. An example medium stringency
20 wash for a duplex of, e.g., more than 100 nucleotides, is lx SSC at 45°C for 15 minutes. An
example low stringency wash for a duplex of, e.g,, more than 100 nucleotides, is 4-6x SSC at
40°C for 15 minutes. In general, a signal to noise ratio of 2x (or higher) than that observed
for an unrelated probe in the particular hybridization assay indicates detection of a specific
hybridization. Nucleic acids which do not hybridize to each other under stringent conditions
25 are still substantially identical if the polypeptides which they encode are substantially
identical. This occurs, e.g., when a copy of a nucleic acid is created using the maximum
codon degeneracy permitted by the genetic code.
Expression vectors are defined herein as nucleic acid sequences that are direct
the transcription of cloned copies of genes/cDNAs and/or the translation of their mRNAs in
30 an appropriate host. Such vectors can be used to express genes or cDNAs in a variety of
hosts such as bacteria, bluegreen algae, plant cells, insect cells and animal cells. Expression
vectors include, but are not limited to, cloning vectors, modified cloning vectors, specifically
designed plasmids or viruses. Specifically designed vectors allow the shuttling of DNA
-11-
between hosts, such as bacteria-yeast or bacteria- animal cells. An appropriately constructed
expression vector preferably contains: an origin of replication for autonomous replication in
a host cell, a selectable marker, optionally one or more restriction enzyme sites, optionally
one or more constitutive or inducible promoters. In preferred embodiments, an expression
5 vector is a replicable DNA construct in which a DNA sequence encoding a one or more PKS
and/or NRPS domains and/or modules is operably linked to suitable control sequences
capable of effecting the expression of the products of these synthase and/or synthetases in a
suitable host. Control sequences include a transcriptional promoter, an optional operator
sequence to control transcription and sequences which control the termination of
10 transcription and translation, and so forth.
The term "conservative substitution" is used in reference to proteins or
peptides to reflect amino acid substitutions that do not substantially alter the activity
(specificity or binding affinity) of the molecule. Typically conservative amino acid
substitutions involve substitution one amino acid for another amino acid with similar
15 chemical properties (e.g. charge or hydrophobicity). The following six groups each contain
amino acids that are typical conservative substitutions for one another: 1) Alanine (A),
Serine (S), Threonine (T); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N),
Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M),
Valine (V); and 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W).
20 The "group consisting of ORF-1 through ORF 42" refers to the group
consisting of ORF -7, ORF -6, ORF -5, ORF -4, ORF -3, ORF -2, ORF -1, ORF 0, ORF 1,
ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 81, ORF 1, ORF 2, ORF 3, ORF 4,
ORF 5, ORF 6, ORF 7, ORF 8, ORF 9, ORF 10, ORF 11, ORF 12, ORF 13, ORF 14, ORF
15, ORF 16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23, ORF 24, ORF
25 25, ORF 26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF
35, ORF 36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42 as identified in
Tables I and II. In certain embodiments ORF 9 (cagA) is excluded.
A "biological molecule that is a substrate for a polypeptide encoded by a
enediyne (e.g., C-1027) biosynthesis gene" refers to a molecule that is chemically modified
30 by one or more polypeptides encoded by open reading frame(s) of the C-1027 biosynthesis
gene cluster. The "substrate" may be a native molecule that typically participates in the
biosynthesis of an enediyne, or can be any other molecule that can be similarly acted upon
by the polypeptide.
-12-
A "polymorphism" is a variation in the DNA sequence of some members of a
species. A polymorphism is thus said to be "allelic/' in that, due to the existence of the
polymorphism, some members of a species may have the unmutated sequence (i.e. the
original "allele") whereas other members may have a mutated sequence {i.e. the variant or
5 mutant "allele"). In the simplest case, only one mutated sequence may exist, and the
polymorphism is said to be diallelic. In the case of diallelic diploid organisms, three
genotypes are possible. They can be homozygous for one allele, homozygous for the other
allele or heterozygous. In the case of diallelic haploid organisms, they can have one allele or
the other, thus only two genotypes are possible. The occurrence of alternative mutations can
10 give rise to trialleleic, etc. polymorphisms. An allele may be referred to by the nucleotide(s)
that comprise the mutation.
"Single nucleotide polymorphism" or "SNPs are defined by their
characteristic attributes. A central attribute of such a polymorphism is that it contains a
polymorphic site, "X," most preferably occupied by a single nucleotide, which is the site of
15 the polymorphism's variation (Goelet and Knapp U.S. patent application Ser. No.
08/145,145). Methods of identifying SNPs are well known to those of skill in the art (see,
e.g., U.S. Patent 5,952,174).
Abbreviations used herein include LB, Luria-Bertani; NGDH, dNDP-glucose
4,6-dehydratase ; nt, nucleotide; ORF, open reading frame; PCR, polymerase chain reaction;
20 PEG, polyethyleneglycol; PKS, polyketide synthase; RBS, ribosomal binding site; Apr,
apramycin; R, resistant; Th, thiostrepton; WT, wild-type; and TS, temperature sensitive
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 illustrates the structures of C-1027 chromophore and the benzenoid
diradical intermediate proposed to initiate DNA cleavage.
25 Figure 2 illustrates a scheme using C-1027 open reading frame polypeptides
for the synthesis of deoxysugars.
Figure 3 A illustrates a scheme using C-1027 open reading frame polypeptides
for the synthesis of a p-amino acid.
Figure 3B illustrates a scheme using C-1027 open reading frame polypeptides
30 for the synthesis of a benzoxazolinate.
-13-
Figure 4 illustrates the synthesis of the enediyne core and final assembly of
the C-1027 enediyne.
Figures 5 A, 5B, and 5C illustrate the organization of the C-1027 enediyne
biosynthetic gene cluster. Figure 5A shows a restriction map of the 75-kb sgc gene cluster
5 from S. globisporus as represented by three cosmid clones. Figure 5B illustrates the genetic
organization of the sgcA, sgcB, and cagA genes, showing that they are clustered in the sgc
gene cluster. Probe 1, the 0.55-kb dNDP-glucose 4,6-dehydratase gene fragment from
pBS1002. Probe 2, the 0.73-kb cagA fragment from pBS1003. A, Apal; B, BamHl; E,
EcoRl; K, Kpnl, S, Sacll; Sp, Sphl. Figure 5C shows the genetic organization of the C-1027
10 biosynthesis gene cluster.
Figure 6 shows the DNA and deduced amino acid sequences of the 3,0-kb
BamHl fragment from pBS1007, showing the sgcA and sgcB genes. Possible RBSs are
boxed. The presumed translational start and stop sites are in boldface. Restriction enzyme
sites of interest are underlined. The amino acids, according to which the degenerated PCR
15 primer were designed for amplifying the dNDP-glucose 4,6-dehydratase gene from S.
globisporus, are underlined.
Figure 7 shows the amino acid sequence alignment of SgcA with three other
dNDP-glucose 4,6-dehydratases. Gdh, TDP-glucose 4,6-dehydratase of S. erythraea
(AAA6821 1); MtmE, TDP-glucose 4,6-dehydratase in the mithramycin pathway of S.
20 argillaceus (CAA7 1 847); TylA2, TDP-glucose 4,6-dehydratase in the tylosin pathway of S,
fradiae (S49054). Given in parentheses are protein accession numbers. The apa fold with
the NAD + -binding motif of GxGxxG is boxed.
Figures 8 A and 8B show disruption of sgc A by single crossover homologous
recombination. Figure 8 A shows construction of sgcA disruption mutant and restriction
25 maps of the wild-type S. globisporus C-1027 and S. globisporus SB 1001 mutant strains
showing predicted fragment sizes upon BamHl digestion. Figures 8B and 8C show a
Southern analysis of S. globisporus C-1027 (lane 1) and S. globisporus SB1001 (lanes 2, 3,
and 4, three individual isolates) genomic DNA, digested with BamHl, using (Figure 8B)
pOJ260 vector or (Figure 8C) the 0.75-kb SaclllKpnl fragment of sgcA from pBS1012 as a
30 probe, respectively. B, BamHl; K, Kpnl; S, Sacll.
Figures 9 A, 9B, 9B, and 9D illustrate the determination of C-1027 production
in various S. globisporus strains by assaying their antibacterial activity against M luteus.
-14-
Figure 9A:1, S. globisporusC-1027; 2,3, and 4, S. globisporus SB 1001 (three individual
isolates); 5, S. globisporus AF67; 6, S. globisporus AF40. Figure 9B: 1, S. globisporus C-
1027; 2, S. globisporus SB1001 (pWHM3); 3 and 4, S. globisporus SB1001 (pBS1015) (two
individual isolates). Both S. globisporus SB 1001 (pWHM3) and S. globisporus SB 1001
5 (pBS1015) were grown in the presence of 5 jig/mL thiostrepton. Figure 9C: 1, S.
globisporusC-1027; 2, S. globisporus SB1001 (pBS1015); 3. S. globisporus SB1001; 4. S.
globisporus SB1001 (pWHM3); 5. S. globisporus AF40; 6. 5. globisporus AF44. All 5.
globisporus strains were grown in the absence of thiostrepton. Figure 9D: 1.5. globisporus
(pKC1139); 2. S. globisporus (pBS1018).
1 0 DETAILED DESCRIPTION
This invention provides a complete gene cluster regulating the biosynthesis of
C-1027, the most potent member of the enediyne antitumor antibiotic family. C-1027 is
produced by Streptomyces globisporus C-1027 and consists of an apoprotein (encoded by the
cagA gene) and a non-peptidic chromophore. The C-1027 chromophore could be viewed as
15 being derived biosynthetically from a benzoxazolinate, a deoxyamino hexose, a p-amino
acid, and an enediyne core. Adopting a strategy to clone the C-1027 biosynthesis gene
cluster by mapping a putative dNDP-glucose 4,6-dehydratase (NGDH) gene to cagA, we
localized 75 kb contiguous DNA from S. globisporus encoding a complete C-1027 gene
cluster.
20 Initial sequencing of the cloned gene cluster revealed two genes, sgcA and
sgcB } that encode an NGDH enzyme and a transmembrane efflux protein, respectively, and
confirmed that the cagA gene resides approximately 14 kb upstream of the sgcA,B locus.
The involvement of the cloned gene cluster in C-1027 biosynthesis was demonstrated by
disrupting the sgcA gene to generate C-1027-nonproducing mutants and by complementing
25 the sgcA mutants in vivo to restore C-1027 production.
Subsequent DNA sequence analysis provided the complete enediyne C-1027
gene cluster sequence (SEQ ID NOs: 1 and 2) revealing 50 open reading frames which are
summarized in Tables I and II. These results represent the first cloning of a gene cluster for
enediyne anti-tumor antibiotic biosynthesis.
30
-15-
Table L Summary of the C-1027 gene cluster open reading frames. Table 1. C-1027 gene
cluster open reading frames (-7 to 26), primers for ORF amplification, and proposed
functions
orf# Size Relative Primers Function Seq
position ID
No.
orf -
648
658-11
Fwd:
ATG
GGC
ATG
ACG
GGT
very weak
3
(-7)
bp
Rev:
CTA
GAG
GAT
CCC
GGG
homology to
4
ny U.X. uAy labc
orf -
549
147 8 -
Fwd :
Alb
Lbb
Al 1
bbb
vlldl
c
(-6)
bp
93 0
Rev :
rpp 71
1 LA
bL 1
bl b
HAT
bAl
uJ.b-
niLcbbi viuy
potentiator
>~\ "V" f~y I - d "1 T1
pi D L. fc= -LI1
orf-
1065
2713-
Fwd:
ATG
ACC
ATC
GCC
ACT
N-truncated
7
C-5)
bp
1649
Rev:
TCA
GAG
GCC
GAG
CAC
Methionine
8
o t m "I - Vi^aoci (Hi \r o 1 \7
oyiiuiiciofc; v J- -J- e -L y
psuedogene)
orf-
387
3238-
Fwd:
ATG
AGC
TCG
CTA
CTG
Viral
9
(-4)
bp
2851
Rev :
CTA
GGA
GCC
GGT
ppp
LbL
transcription
1 Pi
J. U
factor
orf-
1530
4971-
Fwd:
ATG
AGC
AGC
AGC
GCC
Viral Homo log
11
(-3)
bp
3442
Rev :
TCA
TTC
GTC
GGC
TGC
possibly primase
JLZ
orf-
3027
5982-
Fwd:
GTG
AGG
GCT
CTG
CCG
Glycerol-
13
(-2)
bp
7478
Rev :
TCA
bAL
ppp
pp7\
bbA
ppp
bbb
irnospnaue a±5v_
1 A
Transporter
\ OI1UA VJ.-L. U.y
IcblbL dllO C /
orf -
2328
99 00 -
Fwd :
GTG
AGC
GTC
7\ pn
ALU
p 7\ p
bAb
uviA-iiKe Qiug
(-1)
bp
7573
Rev :
TCA
ACC
CGC
CCT
GCG
resistance pump
j. b
orf -
13 68
11349-
Fwd :
Alb
Abb
7\ rpp
Alb
Lib
prpp
bib
TVTa + of flnv
in a /xi em ux.
1 7
J. /
0
bp
9982
Rev :
GTG
GCT
GTG
CTC
p p A
bLA
pump
T Q
JL O
orf -
99 9
28590-
Fwd :
Al b
a rr*
Abb
A
Alb
Lib
prpp
bib
s3 "NTT 1 n_/f1 1 1 pnao
CUM lr yiULUbc
1 Q
1
bp
29588
Rev :
rpP "A
ppp
bLL
P7\ p
bAL
ppp
bbb
prpp
b 1 L
^ £i Vlt V~ ^1 "t~ ^ O £2l
Qcnyurduabc
ort -
1566
z y 6 6 z -
Fwd :
bib
A. H7±
ALA
ULA
bib
± J_ Ctiio LiLCLLUlJX. O.HG
21
2
bp
31197
Rev :
TCA
TGT
bbL
ppp
Lbb
rprpp
1 lb
sin ux protein
orf-
1311
31280-
Fwd:
GTG
GAG
TAC
TGG
AAC
Coenzyme F3 90
23
3
bp
32590
Rev:
TCA
GGC
CTG
AGG
GGC
synthase
24
phenylacetyl-CoA
ligase
orf -
1584
32809-
Fwd:
GTG
CCC
CAC
GGT
GCA
phenol
25
4
bp
34392
Rev:
CTA
CAG
CCC
TCC
GAG
hydroxylase
26
chlorophenol-4 -
monooxygena s e
orf -
bp
35274-
Fwd:
ATG
TCT
TCA
ACC
CGT
citrate
27
5
34458
Rev:
TCA
GCC
GCG
CAG
GAA
transport
28
protein
orf -
1272
17924-
Fwd:
ATG
CTG
GAG
AAA
TGC
C -methyl
29
6
16653
Rev:
TCA
GAC
GAG
CTC
CTT
transferase
30
hydroxylase
orf-
735
16653-
Fwd:
ATG
GAG
TAC
GGC
CCC
N-
31
-16-
7
bp
15919
Rev :
TCA
TGC
CGT
GCG
CAC
methyltransf eras
32
orf -
1233
15922-
Fwd:
ATG
AGC
GGC
GGC
CCG
e
Aminotransferase
33
8
bp
14690
Rev :
TCA
CCT
CGC
CGG
ACG
34
orf -
432
14643-
Fwd:
ATG
TCG
TTA
CGT
CAC
CagA
35
9
bp
14212
Rev:
TCA
GCC
GAA
GGT
CAG
36
orf-
1068
13012-
Fwd:
ATG
AAG
GCA
CTT
GTA
dNTP-glucose
37
10
bp
14079
Rev:
TCA
GGC
CGC
GAT
CTC
synthase
38
orf-
1485
12835-
Fwd:
GTG
GAC
GTG
TCA
GCG
Hydroxylase,
39
11
bp
11351
Rev :
TCA
GGA
CCG
CGC
ACC
Halogenase
4 U
orf -
579
25564-
Fwd:
ATG
AAG
CCG
ATC
GGG
dNTP-4-keto-6-
41
12
bp
24986
Rev:
TCAGGA CGA CTT GTT
deoxyglucose
42
3 , 5-epimerase
orf-
1137
24702-
Fwd:
ATG
CCT
TCC
CCC
TTC
3-0-
43
13
bp
23566
Rev:
TCA
GGT
GCG
CTC
GGC
acyl transferase
44
orf -
1455
22878-
Fwd:
GTG
AGA
GAC
GGC
CGG
Coenzyme F-390
45
14
bp
21424
Rev:
TCA CGT GGT GAT GGC
Synthase
46
Phenylacetyl CoA
Ligase
orf-
1482
21407-
Fwd:
ATG
ACC
GAC
CAG
TGC
Anthranilate
47
15
bp
19926
Rev:
TPS
CAA
CTC
Synthase I
48
orf -
663
19929-
Fwd:
GTG
AGC
TTG
TGG
TCT
Anthranilate
49
16
bp
19267
Rev:
TCA
GGC
CGG
TTC
GGC
Synthase II
50
orf-
1161
19191-
Fwd:
GTG
CGT
CCC
TTC
CGT
epoxide
51
17
bp
18031
Rev:
TCA
GCG
GAG
CGG
ACG
hydrolase
52
orf -
423
35938-
Fwd:
ATG
CCA
GCA
CCG
ACT
Unknown
53
18
bp
35516
Rev:
TCA
GTC
GTT
GCC
GCG
54
orf-
1380
27214-
Fwd:
ATG
CGG
GTG
ATG
ATC
glycosyl
55
19
bp
28593
Rev:
TCA
TCG
GTC
CGC
CTC
transferase
56
orf -
1356
25815-
Fwd:
ATG
ACC
AAG
CAC
GCC
squalene
57
20
bp
27170
Rev:
TCA
TAC
GGC
GGC
GCC
monooxygenase
58
orf -
672
23546-
Fwd:
GTG
AGC
GCA
CAA
CTC
hypothetical Fe-
59
21
bp
22875
Rev:
TCA
CGG
CTG
TGC
CTG
S flavoprotein
60
orf -
816
35274-
Fwd:
ATG
TCT
TCA
ACC
CGT
haloacetate
61
22
bp
34458
Rev:
TCA
GCC
GCG
CAG
GAA
dehalogenase
62
hydrolase
orf -
1380
37559-
Fwd :
ATG
ACG
ACG
TCC
GAC
peptide
63
23
bp
38938
Rev :
TCA
GGA
GGT
GAA
GGG
synthetase
64
orf-
1620
40986-
Fwd:
ATG
GCA
TTG
ACT
CAA
Histidine
65
24
bp
39367
Rev:
TCA
GCG
CAG
CTG
GAT
Ammonia lyase
66
orf -
1560
42611-
Fwd:
ATG
ACG
CGG
CCG
GTG
Type II
67
25
bp
41052
Rev:
TCA GCG GGT GAG CCG
adenylation
68
protein
orf -
282
38983-
Fwd:
GTG
TCC
ACC
GTT
TCC
Type II peptidyl
69
26
bp
39264
Rev:
TCA
CTG
CGT
TCC
GGA
carrier protein
70
-17-
Table II. C-1027 gene cluster open reading frames (27 to 42), primers for ORF
amplification, and proposed functions
ORF Relative Primers Function SEQ
Position ID
NO.
orf -
43945
-46023
Fwd:
GTG
TGC
CCG
GTG
ACA
GAC
Antibiotic
71
27
Rev:
TCA
GCC
CAC
GGG
CTG
GGA
Transporter
72
orf-
46167
-47171
Fwd:
GTG
TTG
GGC
GAT
GAG
GAC
0-
73
28
Rev:
TCA
GAC
CGC
GGA
CAT
CTG
methyl transfer
74
ase
orf - -
4 72 2 7
-484 85
Fwd :
ATG
GCC
GGC
CTG
GTC
ATG
p450
75
O Q
Rev :
TCA
GGA
CCC
GAG
GGT
CAC
hvdroxvlase
76
nr f _
KJ X- _L
48610
-4 97 14
Fwd :
GTG
GAC
CAG
ACG
TCT
ACG
Oxidoreduct ase
77
3 0
Rev :
TCA
TGC
AGG
TGC
AGC
GTG
78
orf —
J \J O ~J \J
Fwd :
ATG
AGG
CCG
CTC
GTT
CGG
Unknown
79
Rev :
TCA TCC CGG CCC GGC GGC
Protein
80
5142 0
-52341
Fwd :
ATG
AGA
ACG
CGG
CGA
CGC
Oxidoreduct ase
81
32
Rev:
TCA
CGG
CCG
GAG
GCG
TAC
82
orf -
53241
-54074
Fwd :
GTG
TAT
CAG
CCG
GAC
TGT
Unknown
83
33
Rev:
CTA
CTC
ATT
CCA
GTT
GTG
Protein
84
orf -
54230
-55379
Fwd:
ATG
TCT
ACG
GGC
TAT
CTC
Unknown
85
34
Rev:
TCA
GCC
GCC
GGT
GGC
GCC
Protein
86
orf -
56027
-56881
Fwd:
ATG
TTC
TCC
CCC
GCC
GCC
Oxidase/
87
35
Rev:
TCA
GTA
CGC
CTG
GTG
GGC
Dehydrogenase
88
orf-
56928
-57730
Fwd:
ATG
AAT
TCG
CTC
GAC
GAC
Unknown
89
36
Rev
: TCA GCT CCC GGT CGC CGC
Protein
90
orf -
57834
-58304
Fwd:
ATG
ACC
GCG
ACG
AAT
CCT
Regulatory
91
37
Rev:
CTA
GGC
GGC
GCG
TCC
CGC
92
orf-
58440
-60091
Fwd:
ATG
AGC
ACC
ACG
GCC
GAG
Oxidoreduct ase
93
38
Rev:
TCA
GCC
GCG
CGC
CGA
CGG
94
orf-
60092
-60622
Fwd:
ATG
ACC
CTG
GAG
GCC
TAC
Regulatory
95
39
Rev:
TCA
TGC
GGG
GCT
CCC
GGT
96
orf-
60940
-62020
Fwd:
GTG
AAA
AGT
GAC
TCT
GCC
Regulatory
97
40
Rev:
TCA
ACG
GCG
AGT
TGG
CTG
98
orf-
62045
-62899
Fwd:
GTG
ACC
ACG
AAC
ACC
ATC
Regulatory
99
41
Rev:
TCA
CCC
GCG
ATC
TCG
ATC
10
orf -
62788
-63164
Fwd :
(partial ORF)
p450
10
42
Rev:
TCA
CCT
CGC
CGT
ACT
CAC
hydroxylase
10
5
Surprisingly, sequence analysis failed to reveal any gene that resembles a
polyketide synthase. The C-1027 open reading frames, however, encode polypeptides
exhibiting a wide variety of enzymatic activities {e.g., epoxide hydrase, monooxygenase,
oxidoreductase, P-450 hydroxylase, etc.). The isolated C-1027 gene cluster can be used to
-18-
synthesize C-1027 enediyne antibiotics and/or analogues thereof. The C-1027 gene cluster
can be modified and/or augmented to increase C-1027 and/or C-1027 analogue production.
Alternatively, various components of the C-1027 gene cluster can be used to
synthesize and/or chemically modify a wide variety of metabolites. Thus, for example, ORF
5 6 (C-methyltransferase) can be used to methylate a carbon, while ORF 12, an epimerase, can
be used to change the conformation of a sugar. The ORFs can be combined in their native
configuration or in modified configurations to synthesize a wide variety of
biomolecules/metabolites. Thus, for example, various combinations of C-1027 open reading
frames can be used to synthesize an enediyne core, to synthesize a deoxy sugar, to synthesize
10 a p-amino acid, to make a benzoxazolinate, etc (see, e.g., Figures 2, 3, and 4).
The native C-1027 gene cluster ORFs can be re-ordered, modified, and
combined with other biosynthetic units (e.g. polyketide synthases (PKSs) or catalytic
domains thereof and/or non-ribosomal polypeptide synthetases (NRPSs) or catalytic domains
thereof) to produce a wide variety of molecules. Large chemical libraries can be produced
15 and then screened for a desired activity.
The C-1027 gene cluster also includes a number of drug resistance genes (see,
e.g., Table 2) that confer resistance to C-1027 and/or metabolites involved in C-1027
biosynthesis thereby permitting the cell to complete the enediyne biosynthesis. These
resistance genes can be used to confer enediyne resistance on a cell lacking such resistance
20 or to augment the enediyne resistance of a cell that does tolerate enediynes. Such cells can
be used to produce high levels of enediynes and/or enediyne metabolites, and/or enediyne
analogues.
Table III. C-1027 cluster drug resistance genes.
ORF
Protein
Mechanism
ORF 9:
CagA apoprotein
Drug sequestering
ORF 2:
SgcB transmembrane efflux protein
Drug exporting
ORF 27
Transmembrane transport protein
Drug exporting
ORF0
Na + /H + transporter
Drug exporting
ORF-1
ABC transport (C- terminus)
Drug exporting
ORF -2
Glycerol phosphate transporter
Drug exporting
ORF-1
UvrA-like protein (N-terminus)
DNA repairing
-19-
L Isolation, preparation, and expression of C-1027 nucleic acids.
The C-1027 gene cluster nucleic acids can be isolated, optionally modified,
and inserted into a host cell to create and/or modify a metabolic (biosynthetic) pathway and
5 thereby enable that host cell to synthesize and/or modify various metabolites. Alternatively
the C-1027 gene cluster nucleic acids can be expressed in the host cell and the encoded C-
1027 polypeptide(s) recovered for use as chemical reagents, e.g. in the ex vivo synthesis
and/or chemical modification of various metabolites. Either application typically entails
insertion of one or more nucleic acids encoding one or more isolated and/or modified C-1027
10 enediyne open reading frames in a suitable host cell. The nucleic acid(s) are typically in an
expression vector, a construct containing control elements suitable to direct expression of the
C-1027 polypeptides. The expressed C-1027 polypeptides in the host cell then act as
components of a metabolic/biosynthetic pathway (in which case the synthetic product of the
pathway is typically recovered) or the C-1027 polypeptides themselves are recovered. Using
15 the sequence information provided herein, cloning and expression of C-1027 nucleic acids
can be accomplished using routine and well known methods.
A) C-1027 nucleic acids.
The nucleic acids comprising the C-1027 gene cluster are identified in Tables
I and are listed in the sequence listing provided herein. In particular, Table 1 identifies genes
20 and functions of open reading frames (ORFs) in the C-1027 enediyne biosynthesis gene
cluster and identifies primers suitable for the amplification/isolation of any one or more of
the C-1027 open reading frames. Of course, using the sequence information provided herein,
other primers suitable for amplification/isolation of one or more C-1027 open reading frames
can be determined according to standard methods well known to those of skill in the art (e.g.
25 using Vector NTI Suiteâ„¢, InforMax, Gaithersberg, MD, USA).
Typically such amplifications will utilize the DNA or RNA of an organism
containing the requisite genes (e.g. Streptomyces globisporus) as a template. Typical
amplification conditions include the following PCR temperature program: initial denaturing
at 94°C for 5 min, 24-36 cycles of 45 sec at 94°C, 1 min at 60°C, 2 min at 72°C, followed by
30 additional 7 min at 72°C. One of skill will appreciate that optimization of such a protocol,
e.g. to improve yield, etc. is routine (see, e.g., U.S. Patent No. 4,683,202; Innis (1990) PCR
-20-
Protocols A Guide to Methods and Applications Academic Press Inc. San Diego, CA, etc).
In addition, primer may be designed to introduce restriction sites and so facilitate cloning of
the amplified sequence into a vector.
In one embodiment, this invention provides nucleic acids for the recombinant
5 expression of an enediyne (e.g. a C-1027 enediyne or an analogue thereof). Such nucleic
acids include isolated gene cluster(s) comprising open reading frames encoding polypeptides
sufficient to direct the assembly of the enediyne. In other embodiments of this invention, the
C-1027 open reading frames may be unchanged, but the control elements (e.g. promoters,
enhancers, etc.) may be modified. In still other embodiments, the nucleic acids may encode
10 selected components (e.g. one or more C-1027 or modified C-1027 open reading frames)
and/or may optionally contain other heterologous biosynthetic elements including, but not
limited to polyketide synthase (PKS) and/or non-ribosomal polypeptide synthetase (NRPS)
modules or enzymatic domains.
Such variations may be introduced by design, for example to modify a known
15 molecule in a specific way, e.g. by replacing a single substituent of the enediyne with
another, thereby creating a derivative enediyne molecule of predicted structure.
Alternatively, variations can be made randomly, for example by making a library of
molecular variants of a known enediyne by systematically or haphazardly replacing one or
open reading frames in the biosynthetic pathway. Production of alternative/modified
20 enediyne, and hybrid enediyne PKSs and/or NRPSs and hybrid systems is described below.
Using the information provided herein other approaches to cloning the desired
sequences will be apparent to those of skill in the art. For example, the enediyne, and/or
optionally PKS and/or NRPS modules or enzymatic domains of interest can be obtained
from an organism that expresses such, using recombinant methods, such as by screening
25 cDNA or genomic libraries, derived from cells expressing the gene, or by deriving the gene
from a vector known to include the same. The gene can then be isolated and combined with
other desired biosynthetic elements using standard techniques. If the gene in question is
already present in a suitable expression vector, it can be combined in situ, with, e.g., other
PKS subunits, as desired. The gene of interest can also be produced synthetically, rather
30 than cloned. The nucleotide sequence can be designed with the appropriate codons for the
particular amino acid sequence desired. In general, one will select preferred codons for the
intended host in which the sequence will be expressed. The complete sequence can be
assembled from overlapping oligonucleotides prepared by standard methods and assembled
-21-
into a complete coding sequence (see, e.g., Edge (1981) Nature 292:756; Nambair et al
(1984) Science 223: 1299; Jay et al (1984) J. Biol Chem. 259:6311). In addition, it is noted
that custom gene synthesis is commercially available (see, e.g. Operon Technologies,
Alameda, CA).
5 Examples of such techniques and instructions sufficient to direct persons of
skill through many cloning exercises are found in Berger and Kimmel (1989) Guide to
Molecular Cloning Techniques, Methods in Enzymology 152 Academic Press, Inc., San
Diego, CA (Berger); Sambrook et al (1989) Molecular Cloning - A Laboratory Manual (2nd
ed.) Vol 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY; Ausubel (19
1 0 1 994) Current Protocols in Molecular Biology, Current Protocols, a joint venture between
Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., U.S. Patent 5,017,478; and
European Patent No. 0,246,864.
Expression of f C-1027 open reading frames.
The choice of expression vector depends on the sequence(s) that are to be
15 expressed. Any transducible cloning vector can be used as a cloning vector for the nucleic
acid constructs of this invention. However, where large clusters are to be expressed, it
phagemids, cosmids, Pis, YACs, BACs, PACs, HACs or similar cloning vectors be used for
cloning the nucleotide sequences into the host cell. Phagemids, cosmids, and BACs, for
example, are advantageous vectors due to the ability to insert and stably propagate therein
20 larger fragments of DNA than in M13 phage and lambda phage, respectively. Phagemids
which will find use in this method generally include hybrids between plasmids and
filamentous phage cloning vehicles. Cosmids which will find use in this method generally
include lambda phage-based vectors into which cos sites have been inserted. Recipient pool
cloning vectors can be any suitable plasmid. The cloning vectors into which pools of
25 mutants are inserted may be identical or may be constructed to harbor and express different
genetic markers (see, e.g., Sambrook et a/., supra). The utility of employing such vectors
having different marker genes may be exploited to facilitate a determination of successful
transduction.
In preferred embodiments of this invention, vectors are used to introduce C-
30 1027 biosynthesis genes or gene clusters into host (e.g. Streptomyces) cells. Numerous
vectors for use in particular host cells are well known to those of skill in the art. For
example described in Malpartida and Hopwook, (1984) Nature, 309:462-464; Kao et al,
-22-
(1994), Science, 265: 509-512; and Hopwood et al, (1987) Methods Enzymol, 153:1 16-166
all describe vectors for use in various Streptomyces hosts.
In one preferred embodiment, Streptomyces vectors are used that include
sequences that allow their introduction and maintenance in E. coli. Such Streptomyces! E.
5 coli shuttle vectors have been described (see, for example, Vara et al, (1989) J. Bacteriol,
171:5872-5881; Guilfoile & Hutchinson (1991) Proc. Natl Acad. Set USA, 88: 8553-8557.)
The wildtype and/or modified C-1027 enediyne open reading frame(s) of this
invention, can be inserted into one or more expression vectors, using methods known to
those of skill in the art. Expression vectors will include control sequences operably linked to
10 the desired open reading frame. Suitable expression systems for use with the present
invention include systems that function in eucaryotic and/or prokaryotic host cells.
However, as explained above, prokaryotic systems are preferred, and in particular, systems
compatible with Streptomyces spp. are of particular interest. Control elements for use in
such systems include promoters, optionally containing operator sequences, and ribosome
15 binding sites. Particularly useful promoters include control sequences derived from
enediyne, and/or PKS, and/or NRPS gene clusters. Other promoters {e.g. ermE* as
illustrated in Example 1) are also suitable. Other bacterial promoters, such as those derived
from sugar metabolizing enzymes, such as galactose, lactose (lac) and maltose, will also find
use in the present constructs. Additional examples include promoter sequences derived from
20 biosynthetic enzymes such as tryptophan (trp), the beta -lactamase (bid) promoter system,
bacteriophage lambda PL, and T5. In addition, synthetic promoters, such as the tac promoter
(U.S. Patent 4,551,433), which do not occur in nature also function in bacterial host cells. In
Streptomyces, numerous promoters have been described including constitutive promoters,
such as ErmE and TcmG (Shen and Hutchinson, (1994) J. Biol Chem. 269: 30726-30733),
25 as well as controllable promoters such as actl and actlll (Pleper et al, (1995) Nature, vol.
378: 263-266; Pieper et al, (1995) Am, Chem. Soc, 1 17: 1 1373-1 1374; and Wiesmann et
al, (1995) Chem. & Biol 2: 583-589).
Other regulatory sequences may also be desirable which allow for regulation
of expression of the enediyne open reading frame(s) relative to the growth of the host cell.
30 Regulatory sequences are known to those of skill in the art, and examples include those
which cause the expression of a gene to be turned on or off in response to a chemical or
physical stimulus, including the presence of a regulatory compound. Other types of
regulatory elements may also be present in the vector, for example, enhancer sequences.
-23-
Selectable markers can also be included in the recombinant expression
vectors. A variety of markers are known which are useful in selecting for transformed cell
lines and generally comprise a gene whose expression confers a selectable phenotype on
transformed cells when the cells are grown in an appropriate selective medium. Such
5 markers include, for example, genes that confer antibiotic resistance or sensitivity to the
plasmid.
The various enediyne cluster open reading frames, and/or PKS, and/or NRPS
clusters or subunits of interest can be cloned into one or more recombinant vectors as
individual cassettes, with separate control elements, or under the control of, e.g., a single
10 promoter. The various open reading frames can include flanking restriction sites to allow for
the easy deletion and insertion of other open reading frames so that hybrid synthetic
pathways can be generated. The design of such unique restriction sites is known to those of
skill in the art and can be accomplished using the techniques described above, such as site-
directed mutagenesis and PCR.
15 Methods of cloning and expressing large nucleic acids such as gene clusters,
including PKS- or NRPS-encoding gene clusters, in cells including Streptomyces are well
known to those of skill in the art (see, e.g., Stutzman-Engwall and Hutchinson (1989) Proc.
Natl Acad. Set USA, 86: 3135-3139; Motamedi and Hutchinson (1987) Proc. Natl Acad.
Scl USA, 84: 4445-4449; Grim et al (1994) Gene, 151: 1-10; Kao et al (1994) Science,
20 265: 509-512; and Hopwood et al (1987) Meth. Enzymol, 153: 1 16-166). In some
examples, nucleic acid sequences of well over lOOkb have been introduced into cells,
including prokaryotic cells, using vector-based methods (see, for example, Osoegawa et al,
(1998) Genomics, 52: 1-8; Woon et al, (1998) Genomics, 50: 306-316; Huang et al, (1996)
Nucl Acids Res., 24: 4202-4209). In addition, the cloning and expression of C-1027
25 enediyne is illustrated in Example 1 .
Q Host cells.
The vectors described above can be used to express various protein
components of the enediyne, and/or enediyne shunt metabolites, and/or other modified
metabolites for subsequent isolation and/or to provide a biological synthesis of one or more
30 desired biomolecules (e.g. C-1027 and/or a C-1027 analogue, etc.). Where one or more
proteins of the enediyne biosynthetic gene cluster are expressed (e.g. overexpressed) for
subsequent isolation and/or characterization, the proteins are expressed in any prokaryotic or
-24-
eukaryotic cell suitable for protein expression. In one preferred embodiment, the proteins
are expressed in E. coll
Host cells for the recombinant production of the subject enediynes, enediyne
metabolites, shunt metabolites, etc. can be derived from any organism with the capability of
5 harboring a recombinant enediyne gene cluster and/or subset thereof. Thus, the host cells of
the present invention can be derived from either prokaryotic or eucaryotic organisms.
Preferred host cells are those of species or strains {e.g. bacterial strains) that naturally
express enediynes. Such host cells include, but are not limited to Actinomycetes,
Actinoplanetes, and Streptomycetes, Actinomadura, Micromonospra, and the like.
10 Particularly preferred host cells include, but are not limited to Streptomyces globisporus,
Streptomyces lividans, Streptomyces coelicolor, Micromonospora echinospora spp.
calichenisis, Actinomadura verrucosopora, Micromonospora chersina, Streptomyces
carzinostaticus, and Actinomycete L585-6. Other suitable host cells include, but are not
limited to S. verticillis S. arnbofaciens, S. avermitilis, S. azureus, S. cinnamonensis, S.
15 coelicolor, S. curacoi, S. erythraeus, S.fradiae, S. galilaeus, S. glaucescens, S.
hygroscopicus, S. lividans, S. parvulus, S. peucetius, S. rimosus, S. roseofulvus, S.
thermotolerans, and S. violaceoruber {see, e.g., Hopwood and Sherman (1990) Ann. Rev.
Genet. 24: 37-66; O'Hagan (1991) The Polyketide Metabolites, Ellis Horwood Limited,
etc.).
20 In certain embodiments, a eukaryotic host cell is preferred {e.g. where certain
glycosylation patterns are desired). Suitable eukaryotic host cells are well known to those of
skill in the art. Such eukaryotic cells include, but are not limited to yeast cells, insect cells,
plant cells, fungal cells, and various mammalian cells (e.g. COS, CHO HeLa cells lines and
various myeloma cell lines).
25 D) Recovery of the expression product.
Recovery of the expression product {e.g., enediyne, enediyne analogue,
enediyne biosynthetic pathway polypeptide, etc.) is accomplished according to standard
methods well known to those of skill in the art. Thus, for example where enediyne
biosynthetic gene cluster proteins are to be expressed and isolated, the proteins can be
30 expressed with a convenient tag to facilitate isolation {e.g. a His 6 ) tag. Other standard
protein purification techniques are suitable and well known to those of skill in the art {see,
-25-
e.g., (Quadri et al (1998) Biochemistry 37: 1585-1595; Nakano et al. (1992) Mol. Gen.
Genet. 232: 313-321, etc.).
Similarly where components (e.g. enediyne biosynthetic cluster orfs) are used
to synthesize and/or modify various biomolecules (e.g. enediynes, enediyne analogues, shunt
5 metabolites, etc.) the desired product and/or shunt metabolite(s) are isolated according to
standard methods well know to those of skill in the art (see, e.g., Carreras and Khosla (1998)
Biochemistry 37: 2084-2088, Deutscher (1990) Methods in Enzymology Volume 182: Guide
to Protein Purification, M. Deutscher, ed. etc.).
IL Use of C-1027 open reading frames in directed biosynthesis.
10 Elements (e.g. open reading frames) of the C-1027 biosynthetic gene cluster
and/or variants thereof can be used in a wide variety of "directed" biosynthetic processes (i.e.
where the process is designed to modify and/or synthesize one or more particular preselected
metabolite(s)). Essentially the entire C-1027 gene cluster can be used to synthesize a C-1027
enediyne and/or a C-1027 enediyne analogue. Individual C-1027 cluster open reading
15 frames can be used to perform chemically modifications on particular substrates and/or to
synthesize various metabolites. Thus, for example, ORF 6 (C-methyltransferase can be used
to methylate a carbon), while ORF 7 (N-methyltransferase) can be used to methylate a
nitrogen. ORF 12, and epimerase, can be used to change the conformation of a sugar, and
ORF 8 (an amino transferase) can be used to aminate a suitable substrate. Similarly,
20 combinations of C-1027 open reading frames can be used to direct the synthesis of various
metabolites (e.g. p-amino acids, deoxysugars, benzoxazolinates, and the like). These
examples, are merely illustrative. One of skill in the art, utilizing the information provided
here, can perform literally countless chemical modifications and/or syntheses using either
"native" enediyne biosynthesis metabolites as the substrate molecule, or other molecules
25 capable of acting as substrates for the particular enzymes in question. Other substrates can
be identified by routine screening. Methods of screening enzymes for specific activity
against particular substrates are well known to those of skill in the art.
The biosyntheses can be performed in vivo, e.g. by providing a host cell
comprising the desired C-1027 gene cluster open reading frames and/or in vivo, e.g., by
30 providing the polypeptides encoded by the C-1027 gene cluster ORFs and the appropriate
substrates and/or cofactors.
-26-
A) Synthesis of enedivnes and enediyne analogues.
In one embodiment, this invention provides for the synthesis of C- 1027
enediynes and/or C-1027 analogues or derivatives. In a preferred embodiment, this is
accomplished by providing a cell comprising a C-1027 gene cluster and culturing the cell
5 under conditions whereby the desired enediyne or enediyne analogue is synthesized. The
cell can be a cell that does not normally synthesize an enediyne and the entire gene cluster
can be transfected into the cell Alternatively, a cell that typically synthesizes enediynes can
be utilized and all or part of the C-1027 gene cluster can be introduced into the cell.
Enediyne derivatives/analogues can be produced by varying the order of, or
10 kind of, gene cluster subunits present in the cell, and/or by changing the host cell (e.g. to a
eukaryotic cell that glycosylates the biosynthetic product), and/or by providing altered
metabolites (e.g. adding exogenous aglycones to a host that carries a gene cassette of the
deoxysugar biosynthesis and glycosylation genes for the production of glycosylated
metabolites), etc.
1 5 In certain embodiments, the host cell need not be transfected with an entire C-
1027 gene cluster. Rather, various components of a C-1027 gene cluster can be altered
within a cell already harboring a C-1027 cluster. By varying or adding various biosynthetic
open reading frames, C-1027 enediyne variants can be produced.
The use of standard techniques of molecular biology (gene disruption, gene
20 replacement, gene supplement) can be used to modulate and/or otherwise alter enediyne
and/or other metabolite (e.g. shunt metabolite) production in an organism that naturally
synthesizes an enediyne (e.g. S. globisporus) or an organism that is modified to synthesize an
enediyne.
In addition, or alternatively, control sequences that alter the expression of
25 various open reading frames can be introduced that alter the amount and/or timing of
enediyne production. Thus, for example, by placing particular C-1027 open reading frames
under control of a constitutive promoter (ermE*) C-1027 production was increased by as
much as 4-fold {see, e.g. Table 3 and Example 1).
30 Table 3. Alteration of C-1027 production by engineering the C-1027 biosynthesis gene
cluster.
Strain Yield (%)
-27-
WT 100
WT/pKC1139 100
WT/erm£*/ORF 2 >150
WT/ORF 9 >100
WT/e7?n£*/ORF 9 <10
WT/ORF 10, 11 >100
WT/ermE*/ORF 10, 1 1 >1 00
WT/ORF 9, 10, 11 >400
ORF2: transmembrane eflux protein; ORF 9: CagA apoprotein; ORF 10: TDP-glucose
synthase; ORF 11; Hydroxylase/halogenase
Where enediyne analogues are synthesized, it will often prove desirable to
5 assay them for biological activity. Such assays are well know to those of skill in the art.
One such assay is illustrated in Example L Briefly, this example depicts an assay of
antibacterial activity against M. luteus as described by Hu et al (1988) J. Antibiot 41: 1575-
1579). Other suitable assays for enediyne activity will be known to those of skill in the art.
B) Use of C-1027 open reading frames to synthesize an enediyne core.
10 The C-1027 open reading frames described herein, or variants thereof, can be
used to synthesize an enediyene core, e.g., from a fatty acid precursor. One such synthetic
pathway is illustrated in Figure 4. This reaction scheme utilizes ORF 17 (epoxide hydrase),
ORF 20 (monooxygenase), ORF 21 (iron-sulfur flavoprotein), ORF 29 (P-450 hydroxylase,
ORF 30 (oxidoreductase), ORF 32 (oxidoreductase), ORF 35 (proline oxidase), and ORF 38
15 (P-450 hydroxylase) to synthesize anenediyne core.
This synthetic pathway, is not considered limiting, but merely illustrative.
Using this as a model, one of ordinary skill in the art can design numerous other synthetic
schemes to produce enediyne cores and/or core variants.
d Use of C-1027 open reading frames to synthesize deoxy sugars.
20 The biosynthesis of various deoxy sugars {e.g., deoxyhexoses) typically share
a common key intermediate -4-keto-6-deoxyglucose nucleoside diphosphate or its analogs,
whose formation from glucose nucleoside diphosphate is catalyzed by the NGDH enzyme,
an NAD + -dependent oxidoreductase (Liu and Thorson (1994) Ann. Rev. Microbiol. 48: 223-
256; Piepersberg (1997) pp. 81-163. In Biotechnology of antibiotics, 2nd ed. W. R. Strohl
-28-
(ed). Marcel Dekker, New York.). Similarly, the C-1027 gene cluster includes an NAGDH
enzyme which can be exploited to synthesize a variety of deoxy sugars.
One illustrative synthetic pathway is shown in Figure 2. This biosynthetic
scheme utilizes ORF 10 (dNDP-glucose synthase), ORF 1 (dNDP-glucose dehydratase),
5 ORF 12 (epimerase), ORF 8 (aminotransferase), ORF 6 (C-methyltransferase), ORF 7 (N-
methyltransferase) and ORF 19 (glycosyl transferase).
This synthetic pathway, is not considered limiting, but merely illustrative.
Using this as a model, one of ordinary skill in the art can design numerous other synthetic
schemes to produce various deoxy sugars.
10 D) Use of C-1027 open reading frames to synthesize B-amino acids.
In still another embodiment, C-1027 biosynthetic polypeptides can be used in
the biosynthesis of p-amino acids. One illustrative synthetic pathway is shown in Figure 3 A.
This biosynthetic scheme utilizes ORF 4 (hydroxylase), ORF 1 1 (hydroxylase/halogenase),
ORF 24 (aminomutase), ORF 23 (type II NRPS condensation enzyme), ORF 25 (type II
1 5 NRPS adenylation enzyme), and ORF 26 (type II peptidyl carrier protein).
Again, this synthetic pathway, is not considered limiting, but merely
illustrative. Using this as a model, one of ordinary skill in the art can design numerous other
synthetic schemes to produce other beta amino acids.
E) Use of C-1027 open reading frames to synthesize benzoxazolinates.
20 The C-1027 open reading frames can also be used to synthesize a
benzoxazolinate. One illustrative synthetic pathway is shown in Figure 3B. This
biosynthetic scheme utilizes ORF 15 (anthranilate synthase I, ORF 16 (anthranilate synthase
II), ORF 4 (phenol hydroxylase/chlorophenol-4-monooxygenase), ORF 11
(Hydroxylase/Halogenase), ORF 28 (O-methyltransferase), ORF 3 (coenzyme F390
25 synthetase, ORF 14 (coenzyme F390 synthetase), and ORF 13 (O-acyltransferase). Again,
this synthetic pathway, is not considered limiting, but merely illustrative. Using this as a
model, one of ordinary skill in the art can design numerous other synthetic schemes to
produce other beta amino acids.
-29-
III. Generation of chemical diversity.
In addition to the directed modification and/or biosynthesis of various
metabolites as described above, the C-1027 biosynthetic gene cluster open reading frames
can be utilized, by themselves or in combination with other biosynthetic subunits (e.g. NRPS
5 and/or PKS modules and/or enzymatic domains of other PKS and/or NRPS systems) to
produce a wide variety of compounds including, but not limited to various enediyne or
enediyne derivatives, various polyketides, polypeptides, polyketide/polypeptide hybrids,
various thiazoles, various sugars, various methylated polypeptides/polyketides, and the like.
As with the directed production of various metabolites described above, such
10 compounds can be produced, in vivo or in vitro, by catalytic biosynthesis, e.g., using large,
enediyne cluster units and/or modular PKSs, NRPSs, and hybrid PKS/NRPS systems. In a
preferred embodiment large combinatorial libraries of cells harboring various
megasynthetases can be produced by the random or directed modification of particular
pathways and then selected for the production of a molecule or molecules of interest. It will
15 be appreciated that, in certain embodiments, such libraries of megasynthetases/modified
pathways, can be used to generate large, complex combinatorial libraries of compounds
which themselves can be screened for a desired activity.
Such combinatorial libraries can be created by the deliberate
modification/variation of selected biosynthetic pathways and/or by random/haphazard
20 modification of such pathways.
A) Directed eneineerine of novel synthetic pathways.
In numerous embodiments of this invention, novel polyketides, polypeptides,
and combinations thereof are created by modifying the entediyne gene cluster ORFs and/or
known PKSs, and/or NRPSs so as to introduce variations into metabolites synthesized by the
25 enzymes. Such variations may be introduced by design, for example to modify a known
molecule in a specific way, e.g. by replacing a single monomeric unit within a polymer with
another, thereby creating a derivative molecule of predicted structure. Such variations can
also be made by adding one or more modules or enzymatic domains to a known PKS or
NRPS or enediyne cluster, or by removing one or more module from a known PKS or
30 NRPS.
Using any of these methods, it is possible to introduce PKS domains, NRPS
domains, and entediyne domains into a megasynthetase. Mutations can be made to the
-30-
native enediyne, and/or NRPS, and/or PKS subunit sequences and such mutants used in
place of the native sequence, so long as the mutants are able to function with other subunits
(domains) in the synthetic pathway. Such mutations can be made to the native sequences
using conventional techniques such as by preparing synthetic oligonucleotides including the
5 mutations and inserting the mutated sequence into the gene encoding a NRPS and/or PKS
subunit using restriction endonuclease digestion, {see, e.g., Kunkel, (1985) Proc. Natl. Acad.
Sci. USA 82: 448; Geisselsoder et al (1987) BioTechniques 5: 786). Alternatively, the
mutations can be effected using a mismatched primer (generally 10-20 nucleotides in length)
which hybridizes to the native nucleotide sequence (generally cDNA corresponding to the
10 RNA sequence), at a temperature below the melting temperature of the mismatched duplex.
The primer can be made specific by keeping primer length and base composition within
relatively narrow limits and by keeping the mutant base centrally located (Zoller and Smith
(1983) Meth, Enzymol. 100: 468). Primer extension is effected using DNA polymerase, the
product cloned and clones containing the mutated DNA, derived by segregation of the primer
15 extended strand, selected. Selection can be accomplished using the mutant primer as a
hybridization probe. The technique is also applicable for generating multiple point
mutations (see, e.g., Dalbie-McFarland et al (1982) Proc. Natl Acad. Sci USA 79:6409).
PCR mutagenesis will also find use for effecting the desired mutations.
J$) Random modification of enediyne pathways.
20 In another embodiment, variations can be made randomly, for example by
making a library of molecular variants (e.g. of a known enediyne) by randomly mutating one
or more elements of the subject gene cluster or by randomly replacing one or more open
reading frames in a gene cluster with one or more of alternative open reading frames.
The various open reading frames can be combined into a single multi-modular
25 enzyme, thereby dramatically increasing the number of possible combinations obtained using
these methods. These combinations can be made using standard recombinant or nucleic acid
amplification methods, for example by shuffling nucleic acid sequences encoding various
modules or enzymatic domains to create novel arrangements of the sequences, analogous to
DNA shuffling methods described in Crameri et al (1998) Nature 391: 288-291, and in U.S.
30 Patents 5,605,793 and in 5,837,458. In addition, novel combinations can be made in vitro,
for example by combinatorial synthetic methods. Novel molecules or molecule libraries, can
be screened for any specific activity using standard methods.
-31-
Random mutagenesis of the nucleotide sequences obtained as described above
can be accomplished by several different techniques known in the art, such as by altering
sequences within restriction endonuclease sites, inserting an oligonucleotide linker randomly
into a plasmid, by irradiation with X-rays or ultraviolet light, by incorporating incorrect
5 nucleotides during in vitro DNA synthesis, by error-prone PCR mutagenesis, by preparing
synthetic mutants or by damaging plasmid DNA in vitro with chemicals. Chemical mutagens
include, for example, sodium bisulfite, nitrous acid, hydroxylamine, agents which damage or
remove bases thereby preventing normal base-pairing such as hydrazine or formic acid,
analogues of nucleotide precursors such as nitrosoguanidine, 5-bromouracil, 2-aminopurine,
10 or acridine intercalating agents such as proflavine, acriflavine, quinacrine, and the like.
Generally, plasmid DNA or DNA fragments are treated with chemicals, transformed into E.
coli and propagated as a pool or library of mutant plasmids.
Large populations of random enzyme variants can be constructed in vivo
using "recombination-enhanced mutagenesis." This method employs two or more pools of,
15 for example, 10 6 mutants each of the wild-type encoding nucleotide sequence that are
generated using any convenient mutagenesis technique, described more fully above, and then
inserted into cloning vectors.
O Incorporation and/or modification of non-C-1027 cluster elements.
In either the directed or random approaches, nucleic acids encoding novel
20 combinations of gene cluster ORFs are introduced into a cell. In one embodiment, nucleic
acids encoding one or more enediyne synthetic cluster ORFS and/or PKS and/or NRPS
domains are introduced into a cell so as to replace one or more domains of an endogenous
gene cluster within a cell. Endogenous gene replacement can be accomplished using
standard methods, such as homologous recombination. Nucleic acids encoding an entire
25 enediyne, enediyne ORF, PKS, NRPS, or combination thereof can also be introduced into a
cell so as to enable the cell to produce the novel enzyme, and, consequently, synthesize the
novel polymer. In a preferred embodiment, such nucleic acids are introduced into the cell
optionally along with a number of additional genes, together called a 'gene cluster,' that
influence the expression of the genes, survival of the expressing cells, etc. In a particularly
30 preferred embodiment, such cells do not have any other enediyne and/or PKS- and/or NRPS-
encoding genes or gene clusters, thereby allowing the straightforward isolation of the
molecule(s) synthesized by the genes introduced into the cell.
-32-
Furthermore, the recombinant vector(s) can include genes from a single
enediyne and/or PKS and/or NRPS gene cluster, or may comprise hybrid replacement PKS
gene clusters with, e.g., a gene for one cluster replaced by the corresponding gene from
another gene cluster. For example, it has been found that ACPs are readily interchangeable
5 among different synthases without an effect on product structure. Furthermore, a given KR
can recognize and reduce polyketide chains of different chain lengths. Accordingly, these
genes are freely interchangeable in the constructs described herein. Thus, the replacement
clusters of the present invention can be derived from any combination of PKS and/or NRPS
gene sets that ultimately function to produce an identifiable polyketide.
10 Examples of hybrid replacement clusters include, but are not limited to,
clusters with genes derived from two or more of the act gene cluster, the whiE gene cluster,
frenolicin (freri), granaticin (gra), tetracenomycin (tern), 6-methylsalicylic acid (6-msas),
oxytetracycline (otc) 9 tetracycline (tet), erythromycin (ery\ griseusin (gris) 9 nanaomycin,
medermycin, daunorubicin, tylosin, carbomycin, spiramycin, avermectin, monensin,
15 nonactin, curamycin, rifamycin and candicidin synthase gene clusters, among others. (For a
discussion of various PKSs, see, e.g., Hopwood and Sherman (1990) Ann. Rev. Genet. 24:
37-66; O'Hagan (1991) The Polyketide Metabolites, Ellis Horwood Limited.
A number of hybrid gene clusters have been constructed, having components
derived from the actjren, tern, gris and gra gene clusters (see, e.g., U.S. Patent 5,712,146).
20 Other hybrid gene clusters, as described above, can easily be produced and screened using
the disclosure herein, for the production of identifiable polyketides, polypeptides or
polyketide/polypeptide hybrids.
Host cells (e.g. Streptomyces) can be transformed with one or more vectors,
collectively encoding a functional PKS/NRPS set, or a cocktail comprising a random
25 assortment of enediyne ORFs and/or PKS and/or NRPS genes, modules, active sites, or
portions thereof The vector(s) can include native or hybrid combinations of enediyne ORFs,
and/or PKS and/or NRPS subunits or cocktail components, or mutants thereof As explained
above, the gene cluster need not correspond to the complete native gene cluster but need only
encode the necessary enediyne ORFs and/or PKS and/or NRPS components to catalyze the
30 production of the desired product(s).
-33-
IV. Variation of starter and/or extender units, and/or host cells.
In addition to varying the nucleic acids comprising the subject gene cluster,
variations in the products produced by the gene cluster(s) can be obtained by varying the the
host cell, the starter units and/or the extender units. Thus, for example different fatty acids
5 can be utilized in the enediyne synthetic pathway resulting in different enediyne variants.
Similarly different intermediate metabolites can be provided (e.g. endogenously produced by
the host cell, or produced by an introduced herterologous construct, and/or supplied from an
exogenous source (e.g. the culture media)). Similarly, varying the host cell can vary the
resulting product(s). For example, a gene cassette carrying the enediyne biosynthesis genes
1 0 can be introduced into a deoxysugar-synthesizing host for the production of glycosylated
enediyne metabolites.
V. Use of C-1027 resistance genes.
The antibiotic C-1027 and metabolites present in C-1027 biosynthesis are
highly potent cytotoxins. Accordingly the biosynthesis of C-1027 is facilitated by the
15 presence of one or more antibiotic (e.g. enediyne) resistance genes. Without being bound to
a particular theory, it is believed that CagA and SgcB function cooperatively to provide
resistance. It is believed that the C-1027 chromophore is first sequestered by binding to the
preaproprotein CagA (ORF 9) to form a complex, which is then transported out of the cell by
the efflux pump SgcB (ORF 2) and processed by removing the leader peptide to yield the
20 chromoprotein. Other genes that appear to mediate resistance in the C- 1 027 biosynthesis
gene cluster include a transmembrane transport protein (ORF 27), a Na + /H + transporter (ORF
0), an ABC transporter (ORF -1, C-terminus), a glycerol phosphate transporter (ORF -2), and
a UvrA-like protein (ORF -1, N-terminus) (see, e.g., Table 2).
These ORFs and/or the polypeptides encoded by these ORFs can be utilized
25 alone, or in combination with one or more other C-1027 ORFs to confer resistance to
enediyne or enediyne metabolites on a cell. This is useful in a wide variety of contexts. For
example, to increase production of enediynes. For example, it is believed that C-1027
resistance could be a limiting factor at the onset of C-1027 production. Provision of an extra
copy of the plasmid-born sgcB, and overexpression of sgcB under the control of the
30 constitutive erniE* promoter resulted in increase of C-1027 production (see example 1).
In a therapeutic context, it is sometimes desirable to confer resistance on
certain vulnerable cells. Thus, for example, where an enediyne is used as a
-34-
chemotherapeutic, transfection of vulnerable, but healthy cells (e.g. liver cells remote from
the tumor site, stem cells, etc.) with vector(s) expressing the resistance gene(s) permits
administration of the enediyne at a higher dosage with fewer adverse effects to the organism.
Such approaches have been taken using the multi-drug resistance gene (MDR1) expressing
5 p-glycoprotein.
In another embodiment vectors are provided containing one or more
resistance genes of this invention under control of a constitutive and/or inducible promoter
thereby providing a "ready-made" expression system suitable for the expression of an
enediyne or enediyne metabolite at high concentration.
10 It is also noted that the resistance genes are expected to confer resistance to
compounds other than enediynes. The resistance genes are expected to confer resistance to
essentially any cytotoxic compound that can act as a substrate for the resistance gene(s) of
this invention.
VI. Kits.
15 In still another embodiment, this invention provides kits for practice of the
methods described herein. In one preferred embodiment, the kits comprise one or more
containers containing nucleic acids encoding one or more of the C-1027 biosynthesis gene
cluster open reading frames. Certain kits may comprise vectors encoding the sgc gene
cluster orfs and/or cells containing such vectors. The kits may optionally include any
20 reagents and/or apparatus to facilitate practice of the methods described herein. Such
reagents include, but are not limited to buffers, labels, labeled antibodies, bioreactors, cells,
etc.
In addition, the kits may include instructional materials containing directions
{i.e., protocols) for the practice of the methods of this invention. Preferred instructional
25 materials provide protocols utilizing the kit contents for creating or modifying C-1027 gene
cluster and/or for synthesizing or modifying a molecule using one or more sgc gene cluster
ORFs. While the instructional materials typically comprise written or printed materials they
are not limited to such. Any medium capable of storing such instructions and
communicating them to an end user is contemplated by this invention. Such media include,
30 but are not limited to electronic storage media (e.g., magnetic discs, tapes, cartridges, chips),
optical media (e.g., CD ROM), and the like. Such media may include addresses to internet
sites that provide such instructional materials.
-35-
EXAMPLES
The following examples are offered to illustrate, but not to limit the claimed
invention.
Example 1
5 Genes for production of the enedivne antitumor antibiotic C-1027 in Strevtomyces
zlobisvorus are clustered with the cagA gene that encodes the C-1027 apoprotein
We have been studying the biosynthesis of C-1027 in Streptomyces
globisporus C-1027 as a model for the enediyne family of antitumor antibiotics (Thorson et
al. (1999) Bioorg. Chem., 27: 172-188). C-1027 consists of a non-peptidic chromophore and
10 an apoprotein, CagA [also called C-1027AG (Otani et al (1991) Agri Biol Chem. 55: 407-
417)]. The C-1027 chromophore is extremely unstable in the protein-free state, the structure
of which was initially deduced from an inactive but more stable degradation product
(Minami et al (1993) Tetrahedron Lett 34: 2633-2636) and subsequently confirmed by
spectroscopic analysis of the natural product (Yoshida et al (1993) Tetrahedron Lett. 34:
15 2637-2640) (Fig. 1). While the absolute stereochemistry of the deoxy sugar moiety was
established by total synthesis (Iida et al (1993) Tetrahedron Lett 34: 4079-4082), the 8S,
9S, 13S and 17 R configuration of the C-1027 chromophore were based only on computer
modeling (Okuno et al (1994) J. Med. Chem. 37: 2266-2273). Although no biosynthetic
study has been carried out specifically on C-1027, the polyketide origin of the enediyne
20 cores has been implicated by feeding experiments with 13 C-labeled acetate for the
neocarzinostatin chromophore A (Hensens et al (1989) J. Am. Chem. Soc. Ill: 3295-3299),
dynemicin (Tokiwa et al (1992) J. Am. Chem. Soc. 1 14: 4107-41 10), and esperamicin (Lam
et al (1993) Am. Chem. Soc. 115: 12340-12345); and deoxysugar biosynthesis has been
well characterized in actinomycetes (Liu and Thorson (1994) Ann. Rev. Microbiol. 48: 223-
25 256; Piepersberg (1997) pp. 81-163. In Biotechnology of antibiotics, 2nd ed. W. R. Strohl
(ed). Marcel Dekker, New York). Given the structural similarity of C-1027 to the other
enediyne cores and to deoxysugars found in other secondary metabolites, we decided to
clone either a PKS or a deoxysugar biosynthesis gene as the first step of identifying the C-
1027 gene cluster from S. globisporus.
30 Furthermore, the CagA apoprotein of C-1027 has been isolated, its amino acid
sequence has been determined, and the corresponding cagA gene has been cloned and
sequenced (Otani et al (1991) Agri. Biol Chem. 55: 407-417; Sakata et al (1992) Biosci.
-36-
Biotech. Biochem. 56: 1592-1595). Since genes encoding secondary metabolite production
in actinomycetes have invariably been found to be clustered in one region of the microbial
chromosome (Hopwood (1997) Chem. Rev. 97: 2465-2497), we further reasoned that
mapping the cagA gene with either a putative PKS gene, a deoxysugar biosynthesis gene, or
5 both to the same region of the S. globisporus chromosome should be viewed as strong
evidence supporting the proposition that the cloned genes constitute the C-1027 biosynthesis
gene cluster.
We report here the cloning and sequencing of two genes, sgcA (Streptomyces
globisporus C-1027) and sgcB y that encode a dNDP-glucose 4,6-dehydratase (NGDH) and a
10 transmembrane efflux protein, respectively. The sgcA,B locus is indeed clustered with the
cagA gene, leading to the localization of a 75-kb gene cluster from S. globisporus. The
involvement of the cloned gene cluster in C-1027 biosynthesis was demonstrated by
disrupting the sgcA gene to generate C-1027-nonproducing mutants and by complementing
the sgcA mutants in vivo to restore C-1027 production. Our results, together with similar
15 effort in the Thorson laboratory on the calicheamicin gene cluster (Thorson et ah (1999)
Bioorg. Chem., 27: 172-188), represent the first cloning of a gene cluster for enediyne
antitumor antibiotic biosynthesis.
Materials and methods.
Bacterial strains and plasmids.
20 Escherichia coli DH5a was used as a general host for routine subcloning
(Sambrook et ah (1989) Molecular cloning, a laboratory manual. Cold Spring Harbor
Laboratory, Cold Spring Harbor, NY). E. coli XL 1-Blue MR (Stratagene, La Jolla, CA)
was used as the transduction host for cosmid library construction. E. coli SI 7-1 was used as
the donor host for E. coli-S. globisporus conjugation (Mazodier et ah (1989) J. Bacterioh
25 171: 3583-3585). Micrococcus luteus ATCC943 1 was used as the testing organism to assay
the antibacterial activity of C-1027 (Hu et ah (1988) J. Antibiot. 41: 1575-1579). The
pGEM-3zf, -5zf, and -7zf and pGEM-T vectors were from Promega (Madison, WI). S.
globisporus strains and other plasmids in this study are listed in Table 3
30 Table 3. Strains and plasmids.
Strain or Relevant Characteristics
-37-
plasmid
S. globisporus
C-1027 Wild-type (Hu et al (1988) J. Antibiot 41: 1575-1579)
AF40 Mutant resulted from acriflavine treatment of S. globisporus C-1027, C-
1027-nonproducing (Mao et al. (1997) Chinese J. Biotechnol. 13: 195-
199)
AF44 Mutant resulted from acriflavine treatment of S. globisporus C-1027, C-
1027-nonproducing (Mao et al, supra)
AF67 Mutant resulted from acriflavine treatment of S. globisporus C-1027, C-
1027-nonproducing (Mao et al, supra)
SB1001 5gc^4-disrupted mutant resulted from integration of pBS1012 into S.
globisporus C-1027 Apr R ? C-1027-nonproducing
SB 1002 sg<^4-disrupted mutant resulted from integration of pBS1013 into S.
globisporus C-1027 Apr R , C-1027-nonproducing
Plasmids:
pOJ446 E. coli-Streptomyces shuttle cosmid, Apr R (Bierman et al. (1992) Gene, 1 16: 43-
pOJ260 E. coli vector, non-replicating in Streptomyces, Apr R (Bierman et al. supra)
pKCl 139 E. coli-Streptomyces shuttle vector, rep TS , Apr 11 (Bierman et al supra)
pWHM3 £. coli-Streptomyces shuttle vector, Th R (Vara et al ( 1 989) J. Bacteriol 111:
5872-5881)
pWHM79 erm E* promoter in pGEM-3zf (Shen and Hutchinson ( 1 996) Proc. Natl Acad.
Set USA 93: 6600-6604)
pBSlOOl 0.75-kb PCR product amplified from S. globisporus with type I PKS primers in pGEM-
T
pBS1002 0.55-kb PCR product amplified from S. globisporus with NGDH gene primers in
pGEM-T
pBS1003 0.73-kb PCR product amplified from pBS1005 with cagA primers in pGEM-T
pBS 1004 pOJ446 S. globisporus genomic library cosmid
pBS1005 pOJ446 iS. globisporus genomic library cosmid
pBS 1006 pOJ446 S. globisporus genomic library cosmid
pBS1007 3.0-kb BamHI fragment from pBS1005 in pGEM-3zf, sgcA, sgcB
pBS 1008 4.0-kb BamHI fragment from pBS 1 005 in pGEM-3zf, cagA
pBS1009 1.0-kb Kpnl truncated fragment of sgcA from pBS1007 in pGEM-3zf
pBSlOlO 0.75-kb SaclVSphl internal fragment of sgcA from pBS1009 in pGEM-5zf
-38-
pBSlOl 1 0.75-kb SacllSphl internal fragment of sgcA from pBSlOlO in pGEM-3zf
pBS 1012 0.75-kb EcoRI/Hin dill internal fragment of sgcA from pBS 1 0 1 0 in pO J260
pBS1013 0.75-kb EcoBJ/HindUI internal fragment of sgcA from pBSlOlO in pKCl 139
pBS1014 2.0-kb EcoRI/Sphl fragment from pBS1007 in the SmaVSphl sites of pWHM79, ermE*
sgcA
pBS1015 2.5-kb EcdRUHindm fragment from pBS1014 in pWHM3, ermE* sgcA
pBS1016 Self-ligation of the 5.2-kb Kpn\ fragment from pBS1007
pBS1017 0.45-kb EcoKl/Sacl fragment from pWHM79 in EcoRUSacI sites of pBS1016, ermE*
sgcB
pBS1018 2.5-kb EcoRI/Hindlll fragment from pBS1017 in pKCl 139, ermE* sgcB
Biochemicals and chemicals.
Ampicillin, apramycin, nalidixic acid, and thiostrepton were from Sigma (St.
Louis, MO). Unless specified otherwise, restriction enzymes and other molecular biology
reagents were from standard commercial sources.
Media and culture conditions.
E. coli strains carrying plasmids were grown in Luria-Bertani (LB) medium
and were selected with appropriate antibiotics. S. globisporus strains were grown on ISP-4
(Difco Laboratories, Detroit, MI) or R2YE at 28°C for sporulation and in TSB (Hopwood et
al (1985) Genetic manipulation of Streptomyces: a laboratory manual. John Innes
Foundation, Norwich, UK) supplemented with 5 mM MgCk and 0.5% glycine at 28°C, 250
rpm for isolation of genomic DNA. For transformation, S. globisporus strains were grown in
YEME (Hopwood et al, supra.) for preparation of protoplasts and on R2YE for protoplast
regeneration. For conjugation, both the E. coli SI 7-1 donors and the S. globisporus
recipients (upon germination in TSB) were prepared in LB, and donors/recipients were
grown on either ISP-4 medium with 0.05% yeast extract and 0.1% tryptone or AS-1 medium
(Baltz (1980) Dev. Ind. Microbiol 21: 43-54; Bierman et al (1992) Gene 116: 43-69) at
30°C for isolation of exconjugants.
For C-1027 production, S. globisporus strains were grown either on R2YE or
ISP-4 agar medium at 28°C or in liquid medium by a two-stage fermentation. For liquid
culture, the seed inoculum was prepared by inoculating 50 mL medium (consisting of 2%
glycerol, 2% dextrin, 1% fish meal, 0.5% peptone, 0.2% (NH 4 ) 2 S0 45 and 0.2% CaC0 3 , pH
7.0) with an aliquot of spore suspension, incubating at 28°C, 250 rpm for two days. To a
-39-
fresh 50 mL of the same medium was then added the seed culture (5%), and incubation
continued at 28°C, 250 rpm for three to six days (Hu et al (1988) J. Antibiot 41: 1575-
1579). The fermentation supernatants were harvested by centrifugation (Eppendorf 5415C,
4°C, 10 min, 14,000 rpm) on day 3, 4 and 5, and assayed for their antibacterial activity
5 against M. luteus (Hu et al (1988) J, Antibiot, 41: 1575-1579).
DNA isolation and manipulation.
Plasmid preparation and DNA extraction were carried out by using
commercial kits (Qiagen, Santa Clarita, CA). Total S. globisporus DNA was isolated
according to literature protocols (Hopwood et al (1985) Genetic manipulation of
10 Streptomyces: a laboratory manual John Innes Foundation, Norwich, UK; Rao et al (1987)
Methods Enzymol 153: 166-198). Restriction endonuclease digestion and ligation followed
standard methods (Sambrook et al (1989) Molecular cloning, a laboratory manual Cold
Spring Harbor Laboratory, Cold Spring Harbor, NY). For Southern analysis, digoxigenin
labeling of DNA probes, hybridization, and detection were performed according to the
15 protocols provided by the manufacturer (Boehringer Mannheim Biochemicals, Indianapolis,
IN).
DNA sequencing.
Automated DNA sequencing was carried out on an ABI Prism 377 DNA
Sequencer using the ABI Prism dye terminator cycle sequencing ready reaction kit and
20 AmpliTag DNA polymerase FS (Perkin-Elmer/ABI, Foster City, CA). Sequencing service
was provided by either the DBS Automated DNA Sequencing Facility, UC Davis, or Davis
Sequencing Inc. (Davis, CA). Data were analyzed by ABI Prism Sequencing 2.1.1 software
and the Genetics Computer Group program (Madison, WI).
Polymerase chain reaction (PCR)»
25 Primers were synthesized at the Protein Structure Laboratory, UC Davis,
PCR was carried out on a Gene Amp PCR System 2400 (Perkin-Elmer/ABI) with Tag
polymerase and buffer from Promega. A typical PCR mixture consisted of 5 ng of S.
globisporus genomic or plasmid DNA as template, 25 pmoles of each primers, 25 yM dNTP,
5% DMSO, 2 units of Taq polymerase, 1 x buffer, with or without 20% glycerol in a final
30 volume of 50 \xL. The PCR temperature program was as follows: initial denaturing at 94°C
-40-
for 5 min, 24-36 cycles of 45 sec at 94°C, 1 min at 60°C, 2 min at 72°C, followed by
additional 7 min at 72°C.
For type II PKS, the following two pairs of degenerate primers were used —
5'-AGC TCC ATC AAG TCS ATG RTC GG-3* (forward, SEQ ID NO:103) / 5 5 -CC GGT
5 GTT SAC SGC GTA GAA CCA GGC G-3' (reverse, SEQ ID NO: 104) and 5'-GAC ACV
GCN TGY TCB TCV-3' (forward, SEQ ID NO: 105)/5'-RTG SGC RTT VGT NCC RCT-3'
(SEQ ID NO: 106) (B, C+G+T; N, A+C+G+T; R, A+G; S, C+G; V, A+C+G; Y, C+T)
(reverse) (Seow et al (1997) J. Bacteriol, 179: 7360-7368). No product was amplified
under all conditions tested. For type I PKS, the following pair of degenerate primers were
10 used— 5'-GCS TCC CGS GAC CTG GGC TTC GAC TC-3' (forward, SEQ ID NO:107) /
5'-AG SGA SGA SGA GCA GGC GGT STC SAC-3' (S, G+C) (reverse, SEQ ID NO: 108)
(Kakavas et al (1997) J. BacterioL, 179: 7515-7522). A distinctive product with the
predicted size of 0.75 kb was amplified in the presence of 20% glycerol and cloned into
pGEM-T according to the protocol provided by the manufacturer (Promega) to yield
15 pBSlOOl.
For NGDH, the following pair of degenerate primers were used — 5'-CS GGS
GSS GCS GGS TTC ATC GG-3' (forward, SEQ ID NO:109) / 5'-GG GWR CTG GYR
SGG SCC GTA GTT G-3' (R, A+G; S, C+G; W, A+T; Y, C+T) (reverse, SEQ ID NO:l 10)
(Decker, et al (1996) FEMS Lett, 141: 195-201). A distinctive product with the predicted
20 size of 0.55 kb was amplified and cloned into pGEM-T to yield pBS1002.
For cagA, the following pair of primers, flanking its coding region, were
used— 5 '-AG GTG GAG GCG CTC ACC GAG-3 ' (forward, SEQ ID NO: 1 1 1)/5 '-G GGC
GTC AGG CCG TAA GAA G-3' (reverse, SEQ ID NO: 112) (Sakata et al (1992) Bioscl
Biotechnol Biochern., 56: 159201595). A distinctive product with the predicted size of 0.73
25 kb was amplified from pBS 1005 and cloned into pGEM-T to yield pBS 1003.
Genomic library construction and screening.
S. globisporus genomic DNA was partially digested with Mbol to yield a
smear around 60 kb, as monitored by electrophoresis on a 0.3% agarose gel. This sample
was dephosphorylated upon treatment with shrimp alkaline phosphatase and ligated into the
30 E. coli-Streptomyces shuttle vector pOJ446 (Bierman et al (1992) Gene 116: 43-69) that was
prepared by digestion with Hpal, shrimp alkaline phosphatase treatment, and additional
digestion with BamUl. The resulting ligation mixture was packaged with the Gigapack II
-41-
XL two-component packaging extract (Stratagene). The package mixture was transduced
into E. coli XL 1-Blue MR. The transduced cells were spread onto LB plates containing
apramycin (100 jag/mL) and incubated at 37°C overnight. The titer of the primary library
was approximately 6,000 colony- forming units per ^g of DNA. Restriction enzyme analysis
5 of twelve randomly selected cosmids confirmed that the average size of inserts was about 35
to 45 kb (Rao et al. (1987) Meth. EnzymoL, 153: 166-198).
To screen the genomic library, colonies from five LB plates containing
apramycin (100 jag/mL, with approximately 2,000 colonies per plate) were transferred to
nylon transfer membranes (Micro Separations, Inc., Westborough, MA) and screened by
10 colony hybridization with the PCR-amplified 0.55-kb NGDH fragment from pBS1002 as a
probe. The positive cosmid clones were re-screened by PCR with primers for NGDH and
confirmed by Southern hybridization (Sambrook et al., supra.). Further restriction enzyme
mapping and chromosomal walking of these overlapping cosmids led to the genetic
localization of the 75-kb sgc gene cluster, as represented by pBS1004, pBS1005, and
15 pBS1006 (Fig. 5A). A 3.0-kb BamRl fragment from pBS1005 that hybridized to the NGDH
probe was cloned into the same sites of pGEM-3zf to yield pBS1007. Similarly, a 4.0-kb
BamBl fragment from pBS1005 that hybridizes to the PCR-amplified 0.73-kb cagA probe
from pBS1003 was cloned into the same sites of pGEM-3zf to yield pBS1008 (Fig. 5B).
Generation of sgcA mutants by insert-directed homologous recombination in S.
20 globisporus.
A 1.0-kb Kpnl fragment from pBS1007, containing the C-terminal truncated
sgcA, was subcloned into pGEM-3zf to yield pBS1009. An internal fragment of sgcA was
moved sequentially as a 0.75-kb SaclVSphl fragment from pBS1009 into the same sites of
pGEM-5zf to yield pBSlOlO and as a 0.75-kb SacllSphl fragment from pBSlOlO into the
25 same sites of pGEM-3zf to yield pBS 101 1 . The latter plasmid was digested with EcoRI and
Hindlll, and the resulting 0.75-kb EcoKL/HindttI fragment was cloned into the same sites of
pOJ260 and pKCl 139 (Bierman et al. (1992) Gene, 116: 43-69 to yield pBS1012 and
pB S 1 0 1 3 , respectively.
Introduction of pBS1012 and pBS1013 into S. globisporus was carried out by
30 either polyethyleneglycol (PEG)-mediated protoplast transformation (Hopwood et al (1985)
Genetic manipulation of Streptomyces: a laboratory manual John Innes Foundation,
Norwich, UK) or E. coli-S. globisporus conjugation (Bierman et al. (1992) Gene 116: 43-69;
-42-
Matsushima and Baltz (1996) Microbiology 142: 261-267; Matsushima et al (1994) Gene
146: 39-45), methods for both of which were developed recently in our laboratory. In brief,
for transformation, pBS1012 and pBS1013 were propagated in E. coli ET12567 (MacNeil et
al (1992) Gene 111: 61-68), and the resulting double strand plasmid DNA was denatured by
5 alkaline treatment (Ho and Chater (1997) J. Bacteriol 179: 122-127). The latter DNA (5
nL) and 200 jiL of 25% PEG 1000 in P buffer (Hop wood et al supra) were sequentially
added to 50 \xL of S. globisporus protoplasts (10 9 ) in P buffer. The resulting suspension was
mixed immediately and spread on R2YE plates. After incubation at 28°C for 16 to 20 hrs,
the plates were overlaid with soft R2YE (0.7% agar) containing apramycin (100 ng/mL, final
10 concentration); incubation continued until colonies appeared (in 5 to 7 days). For
conjugation, E. coli S17-l(pBS1012) or J?, coli SI 7-1 (pBS1013) was grown to an OD 600 of
0.3 to 0.4. Cells from a 20-mL culture were pelleted by centrifligation, washed in LB, and
resuspended in 2 mL of LB as the E. coli donors. S. globisporus spores (10 3 to 10 9 ) were
washed, resuspended in TSB, and incubated at 50°C for 10 min to activate germination.
15 After additional incubation at 37°C for 2 to 5 hrs, the spores were pelleted and resuspended
in LB as the S. globisporus recipients. The donors (100 jliL) and recipients (100 ^L) were
mixed and spread equally onto two modified ISP-4 or AS-1 plates supplemented freshly with
10 mM MgCl 2 (see Media and culture conditions). The plates were incubated at 28°C for 16
to 22 hrs. After removal of most of the E. coli SI 7-1 donors by washing the surface with
20 sterile water, the plates were overlaid with 3 mL of soft LB (0.7% agar) containing nalidixic
acid (50 ng/mL, final concentration) and apramycin (100 p,g/mL, final concentration) and
incubated at 28°C until exconjugants appeared (in approximately 5 days).
Unlike pBS1012, which is a Streptomyces non-replicating plasmid, pBS1013
bears a temperature-sensitive Streptomyces replication origin (Bierman et al (1992) Gene
25 116: 43-69; Muth et al (1989) Mol Gen. Genet 219: 341-348) that is unable to replicate at
temperatures above 34°C (Table 3), while the S. globisporus wild-type strain grows normally
up to 37°C. Thus, spores of S. globisporus (pBS1013), from either the transformants or the
exconjugants, were spread onto R2YE plates containing apramycin (100 ng/mL). The plates
were incubated directly at 37°C, and mutants, resulting from single crossover homologous
30 recombination between pBS1013 and the S. globisporus chromosome, were readily obtained
in 7 to 10 days. Alternatively, the plates were first incubated at 28°C for 2 days until
pinpoint-size colonies became visible and then shifted to 37°C to continue incubation.
-43-
Mutants resulting from single crossover homologous recombination grew out of the original
pinpoint-size colonies as easily distinguishable sectors in 7 to 10 days.
Construction of the s2cA and sscB expression plasmids.
pBS1007 was digested with EcdRI, and made blunt-ended by treatment with
5 the Klenow fragment of DNA polymerase L Upon additional digestion with Sphl, the
resulting 2.0-kb blunt-ended Sphl fragment containing the intact sgcA gene was cloned into
the Small Sphl sites of pWHM79 (Shen et al (1996) Proc. Natl Acad. Set, USA, 93: 6600-
6604) to yield pBS1014. The latter was digested with EcoBl and Hindlll, and the resulting
2.5-kb EcoRIIHindlll fragment was cloned into the same sites of pWHM3 (Vara et al
10 (1989) J. Bacteriol 171: 5872-5881) to yield pBS1015, in which the expression of sgcA is
under the control of the ermE* promoter (Bibb et al (1994) Mol Microbiol 14: 533-545).
Alternatively, pBS1007 was digested with Kpnl, removing most of the sgcA
gene, and the 5.2-kb Kpnl fragment was recovered and self-ligated to yield pBS1016. The
ermE* promoter was subcloned from pWHM79 (Shen et al (1996) Proc. Natl Acad. Set,
15 USA, 93: 6600-6604) as a 0.45-kb EcoKUSacl fragment and cloned into the same sites of
pBS1016 to yield pBS1017. The latter was digested with EcoRI and Hindlll, and the
resulting 2.5-kb EcoSllHindlll fragment was cloned into the same sites of pKCl 139 to yield
pBS1018, in which the expression of sgcB is under the control of the ermE* promoter.
Determination of C-1027 production.
20 The production of C-1027 was detected by assaying its antibacterial activity
against M. luteus (Hu et al (1988) J. Antibiot. 41: 1575-1579). From liquid culture,
fermentation supernant (180 ^iL) was added to stainless steel cylinders placed on LB plates
pre-seeded with overnight M. luteus culture (0.01% vol/vol). From solid culture, a small
square block (0.5 x 0.5 x 0.5 cm 3 ) of agar from either R2YE or ISP-4 medium was directly
25 placed on M. luteus-seeded LB plates. The plates were incubated at 37°C for 24 hrs, and C-
1027 production was estimated by measuring the size of inhibition zones.
Nucleotide sequence accession number.
The nucleotide sequence reported here has been deposited in the GenBank
database with the accession number AF201913.
-44-
Results.
No polyketide synthase gene was amplified by PCR from S. slobisvorus.
On the assumption that the C-1027 enediyne core is of polyketide origin, the
PCR approach was adopted to screen S. globisporus for any putative PKS genes, although it
5 is far from certain a priori if the biosynthesis of the enediyne core invokes a PKS and, if so,
whether the enediyne PKS will exhibit a type I or type II structural organization. PCR
methods for cloning either type I or type II PKS genes have been developed, and these
methods have proven to be very effective in cloning PKS genes from various polyketide-
producing actinomycetes (Kakavas et ah (1997) J. BacterioL 179: 7515-7522; Seow et ah
10 (1997) J. Bacterioh 179: 7360-7368). While no distinctive product was amplified under all
conditions examined with both pairs of primers designed for type II PKS, a single product
with the expected size of 0.75 kb was readily amplified by PCR from S. globisporus with
primers designed for type I PKS, which was subsequently cloned (pBSlOOl). Intriguingly,
sequence analysis of six randomly selected pBSlOOl clones yielded an identical product —
15 indicative of a specific PCR amplification — the deduced amino acid sequence of which,
however, showed no homology to known PKSs (data not shown), excluding the possibility
of using PKS as a probe to identify the sgc biosynthesis gene cluster.
Cloning of a putative NGDH gene by PCR from S. globisporus.
The biosynthesis of various deoxyhexoses share a common key
20 intermediate — 4-keto-6-deoxyglucose nucleoside diphosphate or its analogs — whose
formation from glucose nucleoside diphosphate is catalyzed by the NGDH enzyme, an
NAD + -dependent oxidoreductase (Liu and Thorson (1994) Ann. Rev. Microbiol. 48: 223-
256; Piepersberg (1997) pp. 81-163. In Biotechnology of antibiotics, 2nd ed. W. R. Strohl
(ed). Marcel Dekker, New York). The PCR method was adopted to clone the putative
25 NGDH gene from S. globisporus with primers designed according to the homologous
regions of various NGDH enzymes from actinomycetes (Decker et ah (1996) FEMSLett.
141: 195-201), resulting in the amplification of a single product with the expected size of
0.55 kb (pBS1002). Sequence analysis of pBS1002 confirmed its identity as a part of a
putative NGDH gene.
30 To clone the complete NGDH gene, an S. globisporus genomic library,
constructed in the E. coli-Streptomyces shuttle vector pOJ446 (Bierman et ah (1992) Gene
-45-
116: 43-69; Rao et al (1987) Methods Enzymol 153: 166-198), was analyzed by Southern
hybridization with the PCR-amplified 0.55-kb fragment from pBS1002 as a probe. Of the
10,000 colonies screened, 36 positive colonies were identified, 9 of which were confirmed
by PCR to harbor the DGDH gene. Restriction enzyme mapping showed that all of them
5 contained a single 3.0-kb BamUl fragment hybridizing to the NGDH probe. Additional
chromosomal walking from this locus eventually led to the localization of the 75-kb sgc gene
cluster, covered by 18 overlapping cosmids as represented by pBS1004, pBS1005, and
pBS1006 (Fig. 5 A). The 3.0-kb BamHl fragment was subcloned (pBS1007) (Fig. 5B), and
its nucleotide (nt) sequence was determined.
10 Analysis of the DNA sequences of the sgcA and sscB genes.
Two complete open reading frames (ORFs) (sgcA and sgcB) were identified
within the 3.0-kb BamHl fragment of pBS1007, the 3,035-nt sequence of which is shown in
Figure 6. The sgcA gene most likely begins with an ATG at nt 101, preceded by a probable
ribosome biding site (RBS), GGAGG, and ends with a TGA stop codon at nt 1099. SgcA
15 should therefore encode a 332-amino acid protein with a molecular weight of 36,341 and an
isoelectric point of 6.01. A Gapped-BLAST search showed that the deduced sgcA gene
product is highly homologous to various putative and known NGDH enzymes from
antibiotic-producing actinomycetes, including Gdh from the erythromycin biosynthesis gene
cluster in Saccharopolyspora erythraea (64% identity and 70% similarity) (Linton et al
20 (1995) Gene 153: 33-40), MtmE from the mithramycin biosynthesis gene cluster in
Streptomyces argillaceus (64% identity and 68% similarity) (Lombo et al (1997) J.
BacterioL 179: 3354-3357), and TylA2 from the tylosin biosynthesis gene cluster in
Streptomyces fradiae (62% identity and 68% similarity) (Merson-Davies and Cundliffe
(1994) Mot Microbiol 13: 349-355) (Fig. 7). A conserved sequence of 14 amino acid
25 residues close to the N-termini can be easily identified in these proteins, which has been
described as a pap fold with an NAD + -binding motif, GxGxxG, (Fig. 7, boxed), consistent
with their biochemical role in deoxyhexose biosynthesis (Liu and Thorson (1994) Ann. Rev.
Microbiol 48: 223-256; Piepersberg (1997) pp. 81-163. In Biotechnology of antibiotics, 2nd
ed. W. R. Strohl (ed). Marcel Dekker, New York). The function of Gdh and MtmE as TDP-
30 glucose 4,6-dehydratases, requiring NAD + as a cofactor, has been confirmed by an enzyme
assay following expression of the gdh (Linton et al (1995) Gene 153: 33-40) and mtmE gene
(Lombo et al (1997) J. Bacteriol 179: 3354-3357) in E. coli, respectively, and by
-46-
purification of the Gdh protein from Sacc. erythraea (Vara et al (1989) J. Bacteriol 171:
5872-5881). From these data, it is reasonable to suggest that sgcA encodes the NGDH
enzyme required for the biosynthesis of the 4,6-dideoxy-4-dimethylamino-5-
methylrhamnose moiety of the C-1027 chromophore.
5 Transcribed in the same direction as sgcA, the sgcB gene is located 43 nt
downstream of sgcA. It should begin with a GTG at nt 1 143, preceded by a probable RBS,
AGGAG, and end with a TGA at nt 2708 (Fig. 6). Correspondingly, sgcB should therefore
encode a 521 -amino acid protein with a molecular weight of 52,952 and an isoelectric point
of 4.64. Database comparison of the deduced sgcB product revealed that SgcB is closely
10 related to a family of membrane efflux pumps, such as LfiA from Mycobacterium smegmatis
(43% identity and 50% similarity, protein accession number AAC43550) (Takiff et al
(1996) Proc. Natl Acad. Set USA 93: 362-366), OrfA from Streptomyces cinnamomeus
(42% identity and 47% similarity, protein accession number AAB71209) (Sommer et al
(1997) Appl Environ. Microbiol 63: 3553-3560), and RifiP from the rifamycin biosynthesis
1 5 gene cluster in Amycolatopsis mediterranei (35% identity and 44% similarity, protein
accession number AAC01725) Augus et al (1998) Chem. Biol 5: 69-79). These proteins are
membrane-localized transporters involved in the transport of antibiotics (conferring
resistance), sugars, and other substances. While direct evidence is lacking for RifP
conferring rifamycin resistance in A. mediterranei by transporting it out of the cells (August
20 et al (1998) Chem. Biol, 5: 68-79), it has been proven that LfrA employs the
transmembrane proton gradient in an antiporter mode to drive the efflux of intracellular
antibiotics, resulting in fluoroquinolone resistance inM smegmatis (Takiff et al (1996)
Proc. Natl Acad. Scl USA 93: 362-366). On the basis of the high degree of amino acid
sequence conservation, an equivalent role could be proposed for SgcB, conferring resistance
25 by exporting C-1027 from S. globisporus.
The cagA gene is clustered with the S2CA and sscB locus.
To determine if cagA is clustered with the sgcA and sgcB locus, PCR primers
were designed according to the flanking regions of cagA (Sakata et al (1992) Bioscl
Biotech. Biochem. 56: 1592-1595). A single product with the predicted size of 0.73 kb was
30 indeed amplified from several of the overlapping cosmids (which cover the 75-kb sgc
cluster), including pBS1004 and pBS1005, the identity of which as cagA was confirmed by
sequencing. Restriction enzyme mapping and Southern hybridization analysis localized
-47-
cagA to a single 4.0-kb BamRl fragment that is approximately 14 kb upstream of the sgcA,B
locus (Fig. 5B). The 4.0-kb BamRl fragment was subcloned (pBS1008), and its nt sequence
was determined, revealing the cagA gene along with two additional ORFs (data not shown)
(Fig. 5). As reported earlier, cagA encodes a 142-amino acid protein that is processed by
5 cleavage of a 32-amino acid lead peptide to yield the mature CagA apoprotein (Sakata et al
(1992) BioscL Biotech. Biochem. 56: 1592-1595).
Disruption of the sgcA gene in S. globisporus.
To examine if the cloned sgc cluster encodes C-1027 biosynthesis, sgcA was
insertionally disrupted by a single crossover homologous recombination event to generate C-
10 1027-nonproducing mutant strains (Fig. 8 A). Two plasmids were used — pBS1012 (a
pOJ260 derivative) and pBS1013 (a pKCl 139 derivative), each of which contain a 0.75-kb
internal fragment from sgcA (Table 3). After introduction of pBS1012 into S. globisporus
either by PEG-mediated protoplast transformation or E. coli-S. globisporus conjugation,
transformants or exconjugants that were resistant to apramycin were isolated in all cases.
15 Since pBS1012 is derived from the Streptomyces non-replicating plasmid of pOJ260, these
isolates must have resulted from integration of pBS1012 into the S. globisporus chromosome
by homologous recombination. Plasmid pBS1013 was similarly introduced into S.
globisporus. However, since pBS1013 is derived from pKCl 139 that carries the
temperature-sensitive Streptomyces replication origin from pSG5 and can replicate normally
20 at 28°C (Muth et al (1989) Mol Gen. Genet 219: 341-348), these isolates were subjected to
incubation at the non-permissive temperature of 37°C to eliminate free plasmids from the
host cells. As expected, normal growth stopped except for the recombinants that continue to
grow at 37°C 5 indicative of integration of pBS1013 into S. globisporus by homologous
recombination. The apramycin-resistant S. globisporus SB 1001 and S. globisporus SB 1002
25 strains were chosen as representatives of mutant strains with disrupted sgcA gene resulted
from integration of pBS1012 and pBS1013, respectively.
To confirm that targeted sgcA disruption has occurred by a single crossover
homologous recombination event, Southern analysis of the DNA from the mutant strains was
performed as exemplified for S. globisporus SB 1001 with either pOJ260 or the 0.75-kb
30 Sacll/Kpnl internal fragment of sgcA from pBSlOlO as a probe. As shown in Fig. 8B, a
distinctive band of the predicted size of 6.3 kb was detected with the pOJ260 vector as a
probe in all mutant strains (lanes 2, 3, and 4); this band was absent from the wild-type strain
-48-
(lane 1). Complementarity, when using the 0.75-kb SaclllKpnl internal fragment of sgcA as
a probe (Fig. 8C), the 3.0-kb band in the wild-type strain (lane 1) was split into two
fragments with the size of 6.3 kb and 1 .0 kb in the mutant strains (lanes 2, 3, and 4), as
would be expected for disruption of sgcA by a single crossover homologous recombination
event.
S. globisporus SB1001 and S. zlobisvorus SB1002 are C-1027-nonproducing
mutants.
No apparent difference in growth characteristics and morphologies between
the wild-type S. globisporus and mutant S. globisporus SB 1001 and S. globisporus SB 1002
strains was observed. While C-1027 production in the wild-type S. globisporus strain could
be detected on day 3, peaked on day 5, and continued for a few more days, as judged by
assaying the antibacterial activity of the culture supernant against M luteus (Hu et al (1988)
J. AntibioL 41: 1575-1579), C-1027 production is completely abolished in the sgcA mutant
strains S. globisporus SB 1001 and S. globisporus SB 1002 (Fig. 9A). The latter phenotype
was identical to that of the AF40, AF44, and AF67 mutants, C-1027-nonproducing S.
globisporus strains that have been characterized previously (Fig. 9A and 9C) (Mao, et al
(1997) Chinese J. BiotechnoL 13: 195-199).
In vivo complementation of S. globisporus SB1001.
The ability of the wild-type sgcA gene to complement the disrupted sgcA gene
was tested in the S. globisporus SB1001 strain. The construction of pBS1015, in which the
expression of sgcA is under the control of the constitutive errnE* promoter, was described in
Materials and Methods. Both the pBS1015 construct and the pWHM3 vector as a control
were introduced by transformation into the S. globisporus SB 1001 mutant strains. Culture
supernants from each transformant were bioassayed against M. luteus for C-1027 production.
pBS1015 restored C-1027 production to 5. globisporus SB 1001 to the wild-type level; no C-
1027 production was detected in the control in which pWHM3 was introduced into S.
globisporus BS1001 (Fig. 9B and 9C). A significant reduction of C-1027 production was
observed when S. globisporus SB1001(pBS1015) was cultured under identical conditions but
without thiostrepton (Fig. 9B vs. 6C), indicative that pBS1015 may be unstable in S.
globisporus SB 1001 in the absence of antibiotic selection pressure.
-49-
Expression of sgcB in S. globisporus.
The effect of sgcB on C-1027 production was tested in the wild-type S.
globisporus strain. The construction of pBS1018, in which the expression of sgcB is under
the control of the constitutive ermE* promoter, was described in Materials and Methods.
5 pBS1018 and the pKCl 139 vector as a control were each introduced by conjugation into S.
globisporus. Culture supernatants from each exconjugant were harvested on days 3, 4, and
5, and assayed for C-1027 production by determining the antibacterial activity against M
luteus. While no apparent difference for C-1027 production was observed between the S.
globisporus and S. globisporus (pKCl 139) strains, a significant increase in C-1027
1 0 production ( 1 50±25%) was evident in the early stage of S. globisporus (pBS 1018)
fermentation (Fig. 9D, day 3). However, such effect on C-1027 production leveled off as the
fermentation proceeded and became insignificant when the culture reached the late stationary
phase of fermentation (Fig. 9D, day 4 and 5).
Discussion.
15 Our inability to clone the putative enediyne PKS gene by PCR, with
degenerate primers designed according to the highly conserved amino acid sequences of
either type I or type II PKSs, or by DNA hybridization, with homologous type I or type II
PKS as probes (data not shown), was unexpected, since feeding experiments by
incorporation of [1- 13 C]- and [ 1, 2- 13 C] acetate into the enediyne cores of esperamicin (Lam
20 et al (1993) J. Am. Chem. Soc. 115: 12340-12345), dynemicin (Tokiwa et al. (1992) J. Am.
Chem. Soc. 1 14: 4107-41 10), and neocarzinostatin (Hensens et al (1989) /. Am. Chem. Soc.
Ill: 3295-3299) supported their polyketide origin. Although the enediyne cores are
structurally distinct from either the reduced or aromatic polyketides, the biosynthesis of
which is well characterized by type I or type II PKS, respectively, it could be imagined that
25 an enediyne PKS catalyzes the biosynthesis of a polyunsaturated linear heptaketide
intermediate that is subsequently cyclized into the enediyne core structure (Hu et al. (1994)
Mol Microbiol 14: 163-172; Spainke/a/. (1991) Nature 354: 125-130; Thorson etf
(1999) Bioorg. Chem., 27: 172-188). Alternatively, Hensens and co-workers proposed a
fatty acid origin for the enediyne core that was also consistent with the isotope labeling
30 results. These authors suggested oleate as a precursor that is shortened by loss of carbons
from both ends and is desaturated via the oleate-crepenynate pathway to furnish the enediyne
core (Hensens et al. (1989) /. Am. Chem. Soc. Ill: 3295-3299). The latter pathway
-50-
resembles polyacetylene biosynthesis in higher plants and fungi and requires an acetylene
forming enzyme — a plant gene encoding such an enzyme was identified recently (Lee et al
(1998) Science 280: 915-918). Our DNA sequence analysis of approximately 60 kb of the
sgc gene cluster, fails to reveal any gene that resembles PKS.
Although little is known about the resistance mechanism for the enediyne
antibiotics in general, the apoproteins of the chromoprotein type of enediynes could be
viewed as resistance elements that confer self-resistance to the producing organisms by drug
sequestration (Thorson et al (1999) Bioorg. Chern., 27: 172-188). Such a resistance
mechanism is in fact well established in antibiotic-producing actinomycetes, for example,
BlmA, the bleomycin-binding protein from Streptomyces verticillus (Shen et al (1999)
Bioorg. Chem. 27: 155-171). Given the fact that antibiotic production genes have invariably
been found to be clustered in one region of the microbial chromosome, consisting of
structural, resistance, and regulatory genes, we adopted a strategy to clone the sgc gene
cluster by mapping a putative C-1027 structural gene to the previously cloned cagA gene,
considered as a resistance gene that encodes the C-1027 apoprotein.
We chose NGDH as the putative C-1027 structural gene on the basis of the
4,6-dideoxy-4-dimethylamino-5-methylrhamnose moiety of the C-1027 chromophore. It has
been well established that all deoxyhexoses could be derived from the common intermediate
of 4-keto-6-deoxy glucose nucleoside diphosphate, the biosynthesis of which from glucose
nucleoside diphosphate is catalyzed by an NGDH enzyme. We cloned the NGDH gene from
S. globisporus by PCR and used it as a probe to screen an S. globisporus genomic library,
resulting in the isolation of the 75-kb sgc gene cluster. DNA sequence analysis of a 3.0-kb
BarriHl fragment of the sgc cluster confirmed the presence of the NGDH protein, encoded by
sgcA, along with sgcB that encodes a transmembrane efflux protein (Fig. 6). The cagA gene
indeed resides approximately 14 kb upstream of sgcA (Fig. 5); DNA sequence analysis of a
4.0-kb BamBl fragment confirmed the identity of cagA along with two additional ORFs
(data not shown). These results underline once again the effectiveness of cloning natural
product biosynthesis gene clusters by exploiting the clustering phenomenon between
resistance and structural genes.
The involvement of the cloned gene cluster in C-1027 biosynthesis was
demonstrated by disrupting the sgcA gene to generate S. globisporus mutants, the ability of
which to produce C-1027 was completely abolished (Fig. 9 A), and by complementing the
sgcA mutants in vivo upon expression of sgcA in trans to restore C-1027 production (Fig. 9B
-51-
and 6C). These data unambiguously establish that sgcA is essential for C-1027 production,
and thus support the conclusion that the cloned gene cluster encodes C-1027 biosynthesis. It
should be pointed out that, although the sgcA mutants S. globisporus SB 1001 and S.
globisporus SB 1002 were characterized as C-1027-nonproducing on the basis of the
antibacterial assay alone (Fig. 9 A), this phenotype was identical to that of the controls of the
AF40, AF44, and AF67 mutants (Fig. 9A and 9C). The latter strains were isolated
previously upon randomly mutagenizing the wild-type S. globisporus strain with acriflavine
and confirmed to be C-1027-nonproducing by both the antibacterial bioassay and an
antitumor spermatogonial assay (Mao, et al (1997) Chinese 1 Biotechnol 13: 195-199),
providing strong support to the current study. Gene disruption and complementation in S.
globisporus were made possible by the recently developed genetic system that allowed us to
introduce plasmid DNA into S. globisporus via either PEG-mediated protoplast
transformation (Hopwood et al. (1985) Genetic manipulation ofStreptomyces: a laboratory
manual John Innes Foundation, Norwich, UK) or E. coli-S. globisporus conjugation
(Bierman et al (1992) Gene 116: 43-69; Matsushima and Baltz (1996) Microbiology 142:
261-267; Matsushima et al (1994) Gene 146: 39-45) for analyzing the sgc biosynthesis gene
cluster in vivo. Given the difficulties encountered with calicheamicin biosynthesis in
Micromonospora echinospora, into which all attempts to introduce plasmid DNA have failed
(Thorson et al (1999) Bioorg. Chern., 27: 172-188), the latter results underscore the
importance of selecting C-1027 as a model system for enediyne biosynthesis so that many of
the genetic tools developed in Streptomyces species can now be directly applied to the study
of enediyne biosynthesis.
Finally, the function of sgcB was probed by examining C-1027 production,
following expression of the gene in the wild-type S. globisporus strain. Database
comparison of the deduced amino acid sequence clearly suggested SgcB as a transmembrane
efflux protein, conferring resistance by exporting C-1027 out of the cell. Hence, in addition
to CagA, SgcB could be viewed as the second resistance element identified for C-1027
biosynthesis. Multiple resistance genes have been identified in numerous antibiotic
biosynthesis gene clusters (Hopwood (1997) Chem. Rev. 97: 2465-2497). It could be
imagined that CagA and SgcB function cooperatively to provide resistance — the C-1027
chromophore is first sequestered by binding to the preaproprotein CagA to form a complex,
which is then transported out of the cell by the efflux pump SgcB and processed by removing
the leader peptide to yield the chromoprotein, although we do not have any experimental
-52-
data to substantiate this speculation. Since it is known that yields for antibiotic production
could be profoundly altered by the introduction of extra copies of regulatory, resistance, or
structural genes into wild-type organisms (Hutchinson (1994) Bio/Technology 12: 375-380),
we tested the effect of overexpressing sgcB in S. globisporus on C-1027 production. While
5 no apparent adverse effect on C-1027 production was observed upon introduction of the
pKCl 139 vector into S. globisporus (data not shown), a significant increase in C-1027
production (150±25%) was observed in the early stage of S. globisporus (pBS1017)
fermentation (Fig. 9D, day 3), supporting the predicted function for SgcB in C-1027
biosynthesis. We propose that C-1027 resistance could be a limiting factor at the onset of C-
10 1027 production, which is circumvented by the extra copy of the plasmid-born sgcB, and
overexpression of sgcB under the control of the constitutive ermE* promoter results in
increase of C-1027 production. However, as the S. globisporus (pBS1017) fermentation
proceeds to its stationary phase, C-1027 resistance is no longer a limiting factor for overall
C-1027 production, and the effect of extra copy of SgcB on C-1027 production consequently
15 became insignificant (Fig. 9D, day 5).
In conclusion, genetic analysis of enediyne biosynthesis has heretofore met
with little success in spite of considerable effort (Thorson et al (1999) Bioorg. Chem,, 27:
172-188). The localization of the sgc gene cluster and characterization of the sgcA and sgcB
genes have now provided an excellent basis for genetic and biochemical investigations
20 and/or modification of C-1027 biosynthesis, and gene disruption and overexpression in S.
globisporus clearly demonstrated the potential to construct enediyne-overproducing strains
and to produce novel enediynes that may have enhanced potency as novel anticancer drugs
using combinatorial biosynthesis and targeted mutagenesis. We envisage that the results
from C-1027 biosynthesis should facilitate the cloning and characterization of biosynthesis
25 gene clusters of other enediyne antibiotics in Streptomyces as well as in other actinomycetes,
and could have a great impact on the overall field of combinatorial biosynthesis.
-53-
It is understood that the examples and embodiments described herein are for
illustrative purposes only and that various modifications or changes in light thereof will be
suggested to persons skilled in the art and are to be included within the spirit and purview of
5 this application and scope of the appended claims. All publications, patents, and patent
applications cited herein are hereby incorporated by reference in their entirety for all
purposes.
-54-
CLAIMS
What is claimed is:
1 . An isolated nucleic acid comprising a nucleic acid selected from the
group consisting of
5 a nucleic acid encoding any of C-1027 open reading frames (ORFs) -7
through 42, excluding ORF 9 (cagA);
a nucleic acid encoding a polypeptide encoded by any of C-1027 open
reading frames (ORFs) -7 through 42, excluding ORF 9 (cagA); and
a nucleic acid amplified by polymerase chain reaction (PCR) using
10 primer pairs that amplify any of C-1027 open reading frames (ORFs) -7 through 42,
excluding ORF 9 (cagA).
2. The isolated nucleic acid of claim 1, wherein said nucleic comprises a
nucleic acid encoding at least two open reading frames (ORFs) selected from the group
consisting of ORF-1 through ORF 42, excluding ORF 9 (cagA).
15 3. The isolated nucleic acid of claim 1, wherein said nucleic comprises a
nucleic acid encoding at least three open reading frames (ORFs) selected from the group
consisting of ORF-1 through ORF 42, excluding ORF 9 (cagA).
4. An isolated nucleic acid comprising a nucleic acid that specifically
hybridizes under stringent conditions to an open reading frame (ORF) of the C-1027
20 biosynthesis gene cluster, excluding ORF 9 (cagA), and can substitute for the ORF to which
it specifically hybridizes to direct the synthesis of an enediyne.
5. The isolated nucleic acid of claim 4, wherein said isolated nucleic acid
comprises a nucleic acid that specifically hybridizes under stringent conditions to a nucleic
acid selected from the group consisting of ORF -7, ORF -6, ORF -5, ORF -4, ORF -3, ORF -
25 2, ORF -1, ORF 0, ORF 1, ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 8, ORF 10,
ORF 11, ORF 12, ORF 13, and ORF 14.
6. The isolated nucleic acid of claim 4, wherein said isolated nucleic acid
comprises a nucleic acid that specifically hybridizes under stringent conditions to a nucleic
-55-
acid selected from the group consisting of ORF 15, ORF 16, ORF 17, ORF 18, ORF 19,
ORF 20, ORF 21, ORF 22, ORF 23, ORF 24, ORF 25, ORF 26, ORF 27, ORF 28, ORF 29,
ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 35, ORF 36, ORF 37, ORF 38, ORF 39,
ORF 40, ORF 41, and ORF 42.
7. The isolated nucleic acid of claim 5, wherein said isolated nucleic acid
comprises a nucleic acid selected from the group consisting of ORF -7, ORF -6, ORF -5,
ORF -4, ORF -3, ORF -2, ORF -1, ORF 0, ORF 1, ORF 2, ORF 3, ORF 4, ORF 5, ORF 6,
ORF 7, ORF 8, ORF 10, ORF 11, ORF 12, ORF 13, and ORF 14.
8. The isolated nucleic acid of claim 6, wherein said isolated nucleic acid
comprises a nucleic acid selected from the group consisting of ORF 15, ORF 16, ORF 17,
ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23, ORF 24, ORF 25, ORF 26, ORF 27,
ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 35, ORF 36, ORF 37,
ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42.
9. The isolated nucleic acid of claim 4, wherein said nucleic acid
comprises a nucleic acid that is a single nucleotide polymorphism (SNP) of a nucleic acid
selected from the group consisting of ORF -7, ORF -6, ORF -5, ORF -4, ORF -3, ORF -2,
ORF -1, ORF 0, ORF 1, ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 81, ORF 1,
ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 8, ORF 10, ORF 11, ORF 12, ORF 13,
ORF 14, ORF 15, ORF 16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23,
ORF 24, ORF 25, ORF 26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33,
ORF 34, ORF 35, ORF 36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42.
10. An isolated gene cluster comprising open reading frames encoding
polypeptides sufficient to direct the assembly of a C-1027 enediyne or a C-1027 enediyne
analogue.
1 1 . The gene cluster of claim 10, wherein said gene cluster is present in a
bacterium,
12. The gene cluster of claim 11, wherein said gene cluster is present in a
bacterium selected from the group consisting of Actinomycetes, Actinoplanetes,
Actinornadura, Micromonospora, and Streptomycetes.
-56-
13. The gene cluster of claim 11, wherein said gene cluster is present in a
bacterium selected from the group consisting Streptomyces globisporus, Streptomyces
lividans, Streptomyces coelicolor, Micromonospora echinospora spp. calichenisis,
Actinomadura verrucosopora, Micromonospora chersina, Streptomyces carzinostaticus, and
5 Actinomycete L585-6.
14. The gene cluster of claim 13, wherein one or more open reading
frames is operatively linked to a heterologous promoter.
15. An isolated polypeptide comprising a catalytic domain encoded by a
nucleic acid of a C-1027 gene cluster wherein said nucleic acid comprises a nucleic acid
10 selected from the group consisting of
a nucleic acid encoding any of C-1027 open reading frames (ORFs) -7
through 42, excluding ORF 9 (cagA); and
a nucleic acid amplified by polymerase chain reaction (PCR) using
any one of the primer pairs identified in Tables I and II that specifically amplify one or more
1 5 of (ORFs) -7 through 42, excluding ORF 9 (cagA).
16. The polypeptide of claim 15, wherein said polypeptide is encoded by
at least two open reading frames selected from the group consisting of C-1027 open reading
frames (ORFs) -7 through 42, excluding ORF 9 (cagA).
17. The polypeptide of claim 15, wherein said polypeptide is encoded by
20 at least three open reading frames selected from the group consisting of C-1027 open reading
frames (ORFs) -7 through 42, excluding ORF 9 (cagA).
18. An expression vector comprising a nucleic acid of any one of claims 1
through 9.
19. A host cell transformed with an expression vector of claim 18.
25 20. The host cell of claim 1 9, wherein said cell is transformed with an
exogenous nucleic acid comprising a gene cluster encoding polypeptides sufficient to direct
the assembly of a C-1027 enediyne or a C-1027 enediyne analogue.
21. The host cell of claim 19, wherein said host cell is a bacterium.
-57-
22. The host cell of claim 21, wherein said bacterium is selected from the
group consisting of Actinomycetes, Actinoplanetes, Actinomadura, Micromonospora, and
Streptomycetes.
23. The host cell of claim 21, wherein said bacterium is selected from the
5 group consisting of Streptomyces globisporus, Streptomyces lividans, Streptomyces
coelicolor, Micromonospora echinospora spp. calichenisis, Actinomadura verrucosopora,
Micromonospora chersina, Streptomyces carzinostaticus, and Actinomycete L585-6.
comprising contacting a biological molecule that is a substrate for a polypeptide encoded by
10 a C-1027 biosynthesis gene cluster open reading frame, with a polypeptide encoded by a C-
1 027 biosynthesis gene cluster open reading frame whereby said polypeptide chemically
modifies said biological molecule.
selected from the group consisting of a hydroxylase, a homocysteine synthase, a dNDP-
15 glucose dehydrogenase, a citrate carrier protein, a C-methyl transferase, an N-methyl
transferase, an aminotransferase, a CagA apoprotein, an NDP-glucose synthase, an
epimerase, an acyl transferase, a coenzyme F390 synthase, and epoxidase hydrolase, an
anthranilate synthase, a glycosyl transferase, a monooxygenase, a type II condensation
protein, an aminomutase, a type II adenylation protein, an O-methyl transferase, a P-450
20 hydroxylase, an oxidoreductase, and a proline oxidase.
26. The method of claim 24, wherein said method comprising contacting
said biological molecule with at least two different polypeptides encoded by C-1027
biosynthesis gene cluster open reading frames.
27. The method of claim 24, wherein said method comprising contacting
25 said biological molecule with at least three different polypeptides encoded by C-1027
biosynthesis gene cluster open reading frames.
24. A method of chemically modifying a biological molecule, said method
25. The method of claim 24, wherein said polypeptide is an enzyme
28. The method of claim 24, wherein said contacting is in a host cell.
29.
The method of claim 28, wherein said host cell is a bacterium.
-58-
30. The method of claim 24, wherein said contacting ex vivo.
3 1 . The method of claim 28, wherein said biological molecule is an
endogenous metabolite produced by said host cell
32. The method of claim 28, wherein said biological molecule is an
5 exogenous supplied metabolite.
33. The method of claim 28, wherein said host cell is a eukaryotic cell.
34. The method of claim 33, wherein said eukaryotic cell is selected from
the group consisting of a mammalian cell, a yeast cell, a plant cell, a fungal cell, and an
insect cell.
10 35. The method of claim 28, wherein said host cell synthesizes sugars and
glycosylates the biological molecule.
36. The method of claim 35, wherein said host cell synthesizes
deoxy sugars.
37. The method of claim 24, wherein said method further comprises
15 contacting said biological molecule with a polyketide synthase or a non-ribosomal
polypeptide synthetase.
38. The method of claim of claim 24, wherein said contacting is in a
bacterial cell.
39. The method of claim of claim 24, wherein said contacting is ex vivo,
20 40. The method of claim 24, wherein said method comprises contacting
said biological molecule with at substantially all of the polypeptides encoded by C-1027
biosynthesis gene cluster open reading frames and said method produces an enediyne or
enediyne analogue.
41. The method of claim 24, wherein said biological molecule is a fatty
25 acid and said biological molecule is contacted with a C-1027 orf polyeptide selected from the
-59-
group consisting of an epoxide hydrase, a monooxygenase, an iron-sulfer flavoprotein, a p-
450 hydroxylase, an oxidoreductase, and a proline oxidase.
42. The method of claim 41, wherein said biological molecule is a fatty
acid and said biological molecule is contacted with a plurality of C-1027 orf polypeptides
comprising an epoxide hydrase, a monooxygenase, an iron-sulfer flavoprotein, a p-450
hydroxylase, an oxidoreductase, and a proline oxidase.
43. The method of claim 42, wherein said biological molecule is contacted
with polypeptides encoded by ORF17, ORF20, ORF21, ORF29, ORF30, ORF32, ORF35,
and ORF38.
44. The method of claim 41, wherein said biological molecule is contacted
with polypeptides encoded by ORF 15, ORF 16, ORF 28, ORF3, ORF 14, and ORF 13.
45. The method of claim 44 wherein said biological molecule is also
contacted with polypeptides encoded by ORF 4 and ORF 3.
46. The method of claim 24, wherein said method comprises contacting a
sugar with one or more C-1027 open reading frame polypeptides selected from the group
consisting of a dNDP-glucose synthase, a dNDP glucose dehydratase, an epimerase, an
aminotransferase, a C-methyltransferase, an N-methyltransferase, and a glycosyl transferase.
47. The method of claim 46, wherein said method comprises contacting a
dNDP-glucose with a plurality of C-1027 open reading frame polypeptides comprising a
dNDP-glucose synthase, a dNDP glucose dehydratase, an epimerase, an aminotransferase, a
C-methyltransferase, an N-methyltransferase, and a glycosyl transferase.
48. The method of claim 24, wherein said method comprises contacting an
amino acid with one or one or more C-1027 open reading frame polypeptides selected from
the group consisting of a hydroxylase, an aminomutase, a type II NRPS condensation
enzyme, a type II NRPS adenylation enzyme, and a type II peptidyl carrier protein.
49. The method of claim 48, wherein said method comprises contacting an
amino acid with a plurality of C-1027 open reading frame polypeptides comprising a
-60-
hydroxylase, a halogenase, an aminomutase, a type II NRPS condensation enzyme, a type II
NRPS adenylation enzyme, and a type II peptidyl carrier protein.
51. A method of synthesizing a chromaprotein type enediyne core, said
5 method comprising contacting a fatty acid with one or more C-1027 orf polypeptides
selected from the group consisting of an epoxide hydrase, a monooxygenase, an iron-sulfer
flavoprotein, a p-450 hydroxylase, an oxidoreductase, and a proline oxidase.
52. The method of claim 51, wherein said fatty acid is contacted with a
plurality of C-1027 orf polypeptides comprising an epoxide hydrase, a monooxygenase, an
10 iron-sulfer flavoprotein, a p-450 hydroxylase, an oxidoreductase, and a proline oxidase.
53. The method of claim 52, wherein said fatty acid is contacted with
polypeptides encoded by ORF17, ORF20, ORF21, ORF29, ORF30, ORF32, ORF35, and
ORF38.
15 contacting a sugar with one or more C-1027 open reading frame polypeptides selected from
the group consisting of a dNDP-glucose synthase, a dNDP glucose dehydratase, an
epimerase, an aminotransferase, a C-methyltransferase, an N-methyltransferase, and a
glycosyl transferase.
55. The method of claim 54, wherein said method comprises contacting a
20 dNDP-glucose with a plurality of C-1027 open reading frame polypeptides comprising a
dNDP-glucose synthase, a dNDP glucose dehydratase, an epimerase, an aminotransferase, a
C-methyltransferase, an N-methyltransferase, and a glycosyl transferase.
56. The method of claim 55, wherein said dNDP-glucose is contacted with
polypeptides encoded by ORF17, ORF20, ORF21, ORF29, ORF30, ORF32, ORF35, and
25 ORF38.
57. A method of synthesizing a beta amino acid, said method comprising
contacting an amino acid with one or one or more C-1027 open reading frame polypeptides
selected from the group consisting of a hydroxylase, an aminomutase, a type II NRPS
50.
The method of claim 48, wherein said amino acid is a tyrosine.
54. A method of synthesizing a deoxysugar, said method comprising
-61-
condensation enzyme, a type II NRPS adenylation enzyme, and a type II peptidyl carrier
protein.
58. The method of claim 57, wherein said method comprises contacting an
amino acid with a plurality of C- 1027 open reading frame polypeptides comprising a
5 hydroxylase, a halogenase, an aminomutase, a type II NRPS condensation enzyme, a type II
NRPS adenylation enzyme, and a type II peptidyl carrier protein.
59. The method of claim wherein said amino acid is contacted with
polypeptides encoded by ORF 4, ORFT 1, ORF24, ORF23, ORF25, and ORF26.
60. The method of claim 57, wherein said amino acid is a tyrosine.
10 6 1 . A method of synthesizing an enediyne or an enediyne analogue said
method comprising:
culturing a cell comprising a recombinantly modified C-1027 gene
cluster under conditions whereby said cell expresses said enediyne or enediyne analogue;
and
15 recovering said enediyne or enediyne analogue.
62. The method of claim 61, wherein said gene cluster is present in a
bacterium.
63. The gene cluster of claim 62, wherein said gene cluster is present in a
bacterium selected from the group consisting of Actinomycetes, Actinoplanetes,
20 Actinomadura, Micrornonospora, and Streptomycetes.
64. The gene cluster of claim 62, wherein said gene cluster is present in a
bacterium selected from the group consisting Streptomyces globisporus, Streptomyces
lividans, Streptomyces coelicolor, Micrornonospora echinospora spp. calichenisis,
Actinomadura verrucosopora, Micrornonospora chersina, Streptomyces carzinostaticus, and
25 Actinomycete L585-6.
65. The method of claim 61, wherein said gene cluster is present in a
eukaryotic cell.
-62-
66. The method of claim 65, wherein said eukaryotic cell is selected from
the group consisting of a mammalian cell, a yeast cell, a plant cell, a fungal cell, and an
insect cell.
67. The method of claim 61, wherein said host cell synthesizes sugars and
5 glycosylates said enediyne or enediyne analogue.
68. The method of claim 67, wherein said host cell synthesizes
deoxy sugars.
69. A method of making a cell resistant to an enediyne or an enediyne
metabolite, said method comprising expressing in said cell one or more isolated C-1027 open
10 reading frame nucleic acids that encode a protein selected from the group consisting of a
CagA apoprotein, a SgcB transmembrane efflux protein, a transmembrane transport protein,
a Na+/H+ transporter, an ABC transport, a glycerol phosphate tranporter, and a UvrA-like
protein.
70. The method of claim 69, wherein said isolated C- 1 027 open reading
15 frame nucleic acids are selected from the group consisting of ORF 9, ORF2, ORF 27, ORF 0,
ORF 1 c-terminus, ORF 2, and ORF 1 N-terminus.
71. The method of claim 69, wherein said cell is a bacterial cell.
-63-
GENE CLUSTER FOR PRODUCTION OF THE ENEDIYNE
ANTITUMOR ANTIBIOTIC C-1027
ABSTRACT OF THE DISCLOSURE
5 This invention provides nucleic acid sequences and characterization of the
gene cluster responsible for the biosynthesis of the enediyne C-1027 (produced by
Streptomyces globisporus). Methods are provided for the biosynthesis of enediynes,
enediyne analogs and other biological molecules.
10
15
20
FILE: c:\_docs\2500 ucott\128usl\2500.128wo0enediyne.apl.doc
-64-
-1-
^OH
^OH
oh|
OP
oh|
ONDP
oh|
ONDP
ORF6
ch^dh]
OH ONDP
ORF8
H 2
OH
ONDP
R-OH
ORF7^ (H 3 C) 2 N^YX1A (H 3 C) 2 N" VU^/OR
CHi)H| [cH^3h] ORF19 |CH£>H
ONDP
OH
ONDP
OH
R = enediyne core
ORF10: dNDP-glucose synthase, 355 aa
ORF1: dNDP-glucose dehydratase, 332 aa
ORF12: epimerase, 192 aa
ORF8: aminotransferase, 410 aa
ORF6: C-methyltransferase, 423 aa
ORF7: N-methyltransferase, 244 aa
ORF19: glycosyl transferase, 459 aa
Fig. 2
-2-
ORF4: Hydroxylase, 527 aa ORF23: Type II NRPS condensation enzyme, 459 aj
ORF11: Hydroxylase/halogenase, 492/494 aa ORF25: Type II NRPS adenylation enzyme, 716 aa
ORF24: Aminomutase, 539 aa ORF26: Type II peptidyl carrier protein, 93 aa
Fig. 3 A
-3-
R - enediyne core
ORF15: Anthranilate synthase I, 493 aa ORF3: Coenzyme F390 synthetase, 463 aa
ORF16: Anthranilate synthase II, 220 aa ORF14: Coenzyme F390 synthetase, 484 aa
ORF28: Q-methyltransferase, 350 aa ORF13: O-acyltransferase, 378 aa
Fig. 3B
-4-
Fatty acid
ORF17: Epoxide hydrase
ORF20: Monooxygenase
ORF21: Iron-sulfur flavoprotein
ORF29: P-450 hydroxylase
ORF30: Oxidoreductase
ORF32: Oxidoreductase
ORF35: Proline oxidase
ORF38: P-450 hydroxylase
ORF13: O-acyltransferase, ORF19:Glycosyl transferase, ORF23: Type II NRPS condensation enzyme
Fig. 4
-5-
20
30
I
40
I
50
I
60
I
70
I
80 kb
Fig. 5 A
B B B/B/B/B/B B B B/B/B/B B B B B B B/B
I I HI 1 II 1 []_
B B/B/B/B/B B B B/B/B/B B B
I Hill 1 I HI I I I
pBS1004
B B B B
I I I I
B/B/B/B/B B B B/B/B/B B B B B B
lliil 1 I HI I 1 I I I I
B/B/B B
B/B B/B/B B
pBS1005
pBS1006
Fig. 5B
j i
pBSlOOS
Probe 2
cagA
S/B
— U pBS1007
Probe 1
1 1 1 it 1 1 ii i
sgcA sgcB
Fig. 5C
10
I
20
1
30
40
I
50
!
60
70
I
80 kb
J
3RF: 4 22 18 5 23 26 24 25 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
B
B B B B/B/B/B/B B B B/B/B/B B B B B B B/B
B/B/B B
I 1 I I 1
ORF: -7 -6 -5 -4 -3 -2 -1 0 11 10 9 8 7 6 17 16 15 14 21 13 12 20 19 1 2 3 4
-6-
55
CJ
cd
E-t
LQJ
u
cj
o
CJ
H
^
CJ
EH
CD
CJ
o
cj
<
o
H
U
H
CD
cd
H
u
L)
H
CJ
cd
cj
Eh
CD
CJ
E-i
CJ
O
cj
Eh
CD
o
cj
cj
a
S3
u
u
cj
cj
cd
cj
h u
PS H
O Eh
8
cj
cj
cd
o
cd
cd
o
cj
* 8
EH
cd
Eh ^
o >
cd
i cd
u Pi
cj
cj
cd <
a
<
CD Q
cd w
cj
u
CJ ft
u
a q
cj
En
g >
<
CD P
cj
CO
o
H
Eh
O
o cd
cd
u
U ft
cd
Eh
CJ hh
CD H
CJ
CD H
CD CD H
u u
CJ K CO
cd
Eh
CJ
< H
CJ
a
s
cd
cd
U Pi
a
Eh
CD >
U
Eh
CD
<
cj o
cd
CO
u
Eh
CJ
cd
cd o
u
<; i— i
cj
Eh
B Ph
B
CD
CD CD
O
a
q <
<
cd
O CD
CJ
O
CD CD
CD
CJ
*& H
CD
H
CD >
CD
Eh
CJ hh
CD
CD
CD
<
cj a
cj
Eh
CD
CJ
Eh
Eh
CJ
CJ
3
CD
CD
CJ
CD
CJ
CJ
H
CJ
CJ
CD
CD
CJ
CD
CJ
cj
CJ
Eh
CJ
CJ
Eh
U
a
CJ
Eh
CD
CJ
H
CD
CJ
CJ
CJ
<
CJ
o
CD
O
CJ
<
CJ
<:
o
o
£
CD
u
H
U
a
CJ Pi
u
CO
CO
3
Eh
CD
CJ
CD
o
Eh
CD
<
CD W
CD
CJ ^
CD *C
O
a
a <
cj
Eh
cj
<c
cj pa
CJ
Eh
8 >
E-J
o
o
o
o
o
o
o
o
o
r^-
00
a\
rH
<
1— i
CD
CJ
CD
O
CD
CJ
CD
o
Eh
CJ
Eh
U
H
Cj
o
o
CJ
CJ
Eh
E-M
E-t
CO
<<
o
Eh
CO
CJ
u
CD
CJ
Eh
CJ
Eh
CO
CJ
CD
CD
H
<C
CD
CD
CD
CD
O
CD
CJ
Cj
CD
Eh
^
Eh
^
Cj
s
Ph
O
CD
>
Eh
Cj
8
S
<
CD
Cj
CD
C j
CJ
CD
Q
Eh
Cj
Eh
Eh
U
CJ
rh
s
CD
>
U
CJ
Cj
Eh
CD
6
Eh
CJ
CD
CD
Eh
CJ
ft
CD
-.
Cj
Cj
CD
CD
rh
r i
CD
Eh
8
CD
<
CD
Eh
<
CJ
CD
pq
CD
Q
Cj
CJ
a cd
Cj
cj
5
Eh
Cj
Eh
CJ
<C
Cj
S
Cj
,_q
Eh
CD
Eh
>H
H
H
CD
K
CJ
CJ
CD
Cj
J
CJ
CJ
CD
<
CD
{j
Cj
CJ
ft
CJ
CD
CD
o
CO
ri!
CJ
CD
<
CJ
CD
CD
O
Eh
CD
f\
Eh
H
H
CD
H
CD
CJ
CJ
CD
[>
Cj
CD
Cj
K
Eh
ft
CO
Eh
o
CJ
rf!
Eh
CD
>
CD
CD
3
pq
U
0
CD
H
O
Eh
CD
CD
CD
<r*
Cj
Eh
fH
CD
<£
CD
CJ
Oi
CD
^>
CJ
CD
a
ffi
H
CD
CJ
CD
<J
CJ
CJ
CD
Pi
r.
tn
s
CJ
Cj
£h
[ij
Cj
CD
Q
CD
Cj
rh
Cj
Cj
CD
CD
Fh
g
CD
Cj
ft
Cj
a
<
i_ j
Ph
Cj
Cj
Eh
CO
CJ
Ch
y
E_i
rn
CD
?[
O
CD
H
jh
Eh
CJ
§
\-J
f h
Cj
CD
>
Eh
CD
CJ
tJ
CJ
Cj
Eh
CD
U
<C
Eh
o
Cj
Cj
Cj
Fr
Cj
ri
\y
CD
CD
CD
H
CD
CJ
s
tH
^h
CD
S
. .
O
vJ
^
rj
Cj
fS
r )
U
En
Cj
CD
o
Cj
ft
CD
o
g
Q
HH
tr 1
Eh
IS
Cj
CD
_
rh
Cj
ft
*H
Eh
CD
(J
CD
rh
*s
o
Cj
,_q
Cj
tH
r j
CD
p
u
CD
CJ
O
(->
rh
Eh
J
Cj
Eh
Eh
?H
CD
CJ
Eh
CO
CJ
_
I— )
CJ
Cj
Eh
Cj
Eh
Cj
CJ
CJ
CJ
rh
Eh
>H
CD
CJ
CJ
1
Cj
CD
Cj
CD
pq
O
CJ
U
Cj
Pi
\J
ri
y
Cj
CD
Cj
Si
Eh
CD
2
Cj
â– erf 1
Eh
Cj
Cj
Pi
y
r t
E_|
s
o
<
a
CJ
y
CJ
CJ
1— I
rh
Cj
8
CD
CH
C_i
Cj
rh
S
p
, ~|
Cj
CD
CD
CD
CD
CJ
CD
Cj
fX,
CD
Cj
CD
CD
K
Cj
Cj
CJ
u
Cj
Cj
Cj
CD
Q
CD
CD
CD
<f
C j
CD
CD
CD
CD
CD
s
<
O
CJ
CD
y
Cj
O
CD
H
CD
§
p
Eh
Cj
H
<C
CD
CD
^>
<£
3
s
O
Cj
<f
f TJ
Cj
Eh
>H
CD
Cj
W
Eh
CD
u
PC
CD
Cj
Cj
CD
;>
Eh
o
Cj
Pi
err 1
r )
Cj
Cj
hH
Cj
CD
Eh
CJ
>H
<C
CD
CD
j>
Cj
a
CD
Q
Cj
CJ
ft
Cj
U
CD
i
>
CJ
Eh
Q
Eh
<
CD
U
HH
0
o
CD
>
CD
CD
Eh
CO
E_|
CD
CD
Eh
Cj
rj
j_q
O
CD
CD
CD
>
CD
u
O
04
8
CD
CD
CJ
CD
CD
O
CD
CD
CJ
CD
CD
Eh
CJ
CJ
P^
IS
CJ
CD
Eh
CJ
CJ
CJ
Eh
CO
CD
U
CD
<
<$
CJ
CJ
CJ
pti
CD
CD
CD
P
Eh
CJ
CJ
CD
CD
8
CD
CD
<
CJ
CJ
Pi
Eh
CJ
6
Eh
*G
U
CD
>
B
Eh
o
Q
CD
CJ
1-1
rH
H
H
(— i
1-1
o
O
O
o
o
o
r>
CO
o
o
CJ
3
Eh *
CJ
CD O
CJ
Eh
CD >
CJ
CJ ^
CD <
g °
Eh
a
Eh
CJ J
CJ
CD
CD CD
a
a
8 <
CJ ^
CD <
CD W
3 ^
CD
Eh
cj a
CD
CJ
CJ ft
CD
o a
o
o
o
CD
Eh
CD
CJ
Eh
Eh
CD
CJ Pi
CJ
CD
<
CD H
CJ
H CJ
s g
Eh
CD
CD
Eh
CD
CO
CD
Eh
CD
CD
CJ
< Eh
CD
CJ
CD <
CJ
a
CD <
CD
Eh
a j
CJ
CD
CD CD
CJ
<
CD Q
y
CD H
<c
u
CD
<
U
CD
CD
o
a
H
CD
CD
<
Eh
CJ
CD
CJ
CD
CD
CD
>
Eh
a
CD
O
CD
Eh
CD
CJ
<
Eh
H
CJ
<
Eh
Eh
u
U
hP
CJ
CD
CD
>
CJ
i— I
a
U
CD
<
Eh
CD
a
<<
CO
Eh
a
<
s
Eh
CJ
u
CJ
H
U
CJ
HH
Eh
CD
CD
CD
>
U
CD
<
<
CO
CD
CJ
CD
<
Eh
CD
CD
<
Pi
CD
CJ
CJ
HH
CD
M
CJ
CJ
CD
CD
CD
CD
Eh
CD
«<
CD
CJ
<
CO
Eh
CD
CJ
CJ
Pi
Eh
CD
<
2
IS
CD
O
CJ
HH
CJ
CD
CD
CD
CD
CD
CD
CD
<
Eh
CD
CJ
CJ
Pi
Eh
CD
CD
>
CD
W
Eh
a
CD
>
CD
CD
U
CJ
HH
8
CJ
Eh
CJ
o
Eh
CD
CJ
CD
a
ft
CD
Pi
<
CJ
CD
<
Eh
Eh
CJ
CD
P
CJ
<
2
Eh
<
*v
CD
M
CD
CJ
CD
>
CD
Ph
<
CD
CJ
Pi
CJ
CJ
CD
3
w
CD
a
CD
CJ
<
CJ
CJ
pi
CD
CJ
a
<
U
CD
CD
<
CO
Eh
CJ
CD
<
CO
CJ
<
rH
Eh
CJ
CD
p
Eh
a
<
M
CD
Eh
CJ
CD
>
u
CJ
CJ
Eh
CD
a
CD
<!
H
CJ
CD
CD
CD
Eh
CJ
CJ
HH
CJ
Pi
<;
CJ
CD
>
CJ
CJ
Eh
a
a Eh
CJ
<
H
Eh
CD
CO
CD
CJ
H
u
Eh
Ph
CJ
CJ
CJ
CJ
CD
CD
<
CJ
ft
CJ
CD
CD
CD
Eh
CD
Eh
CD
&
Eh
H
CD
CD
>
CJ
Eh
CJ
s
a
CD
CJ
Eh
CD
3
CJ
pi
CJ
CD
u
CD
CD
CJ
CD
Eh
CO
y
ft
CJ
CD
CD
y
S
CD
CD
<:
CD
CD
w
CD
CD
CD
CD
<
u
o
CD
W
Eh
CJ
CD
<
CD
CJ
y
CJ
a
u
CD
?&
Eh
CD
Cj
Eh
&
Eh
CD
y
CO
CJ
CD
CJ)
§
s
Eh
CJ
ft
CD
§
CD
>
<
CJ
Eh
CJ
>
Eh
CD
Q
<c
CJ
CD
CJ
HH
12
CJ
CD
Q
Eh
y
<
CJ
CJ
CJ
CD
Eh
P*
a
Eh
CJ
Eh
CO
y
CD
<c
CD
>
Eh
cj
CD
w
CJ
CD
CJ
Eh
P-t
Eh
CJ
CD
P-I
Eh
CJ
Eh
Pn
Eh
CJ
o
CD
CD
Eh
CJ
CJ
S
CD
CD
CJ
CD
CD
CJ
CJ
CD
(<
H
CJ
Eh
A
a
H
CJ
CJ
3
s
Eh
CJ
CD
<
CJ
0
CJ
KH
o
CJ
CD
<
CJ
y
P
Eh
CJ
3
Eh
8
CD
<
M
H
o
CD
>
CJ
a» Eh
CD
CD
>
CD
CD
CD
CJ
CJ
CJ
CO
Eh
<
CD
CD
<<
H
U
CJ
HH
CD
CJ
CD
>
Eh
CJ
"CJ
%
Eh
CJ
<
H
CD
CD
Ph
CJ
U
CD
CD
Eh
H
Eh
CO
Eh
CJ
Eh
HH
CD
CD
CJ
HH
CD
CD
CD
CD
CD
CD
CD
CD
Eh
CJ
CD
>
CJ
CJ
CJ
HH
CJ
CD
<
Eh
u
CD
>H
Eh
CJ
CD
>
CJ
CJ
CD
>
Eh
CJ
a
H
s
CD
CJ
H^
CJ
y
CD
H
CJ
CD
CD
CJ
CD
CJ
CJ
CJ
Eh
<
CJ
<
CJ
6
CD
Q
Eh
O
CJ
ft
CJ
CJ
Eh
CJ
CD
CD
CJ
ft
CJ
CJ
EH
CO
CD
CD
CD
CD
Eh
Eh
CJ
CD
Eh
6
CD
>
a
CJ
Eh
HH
CJ
CJ
ft
CJ
CJ
M
CJ
a
CJ
HH
CD
CJ
1
CD
a
CJ
Eh
CO
CD
2
CJ
CJ
CJ
Eh
CJ
CD
<
Eh
CD
<C
Eh
CJ
6
CJ
HH
a
8
>
Eh
CJ
CJ
CJ
Hi
a
y
>
CD
Eh
CD
CD
CJ
<
6
CD
>
<
CD
<
Eh
CD
CD
a
a
o
CD
CJ
CD
CD
Eh
<C
CD
CD
CJ
CJ
CD
H
rH
rH
rH
rH
H
O
O
O
O
O
o
CM
m
IT)
KD
rH
rH
rH
rH
rH
rH
Si
o
CO
<< M
CD
H
CD >
o
cj
cj &
CD
En
CJ kh
cd
H
CD >
CJ
cj
CD
CJ
CD
CJ
Eh
U
CJ
CJ
Eh
CJ
Eh
CJ
CD
CD
CD CD
CD
o
CN
o
CN
CN
o o
o o
CO ^
CN CN
CO
CO
CJ
CD
CJ
CD
Eh
CJ
CD
H
CJ J
CJ
<
CD Q
CD
Eh
CJ J
CD
CD
cj pi
CD
CD CD
CJ
CJ
CD <
CJ
<
CD Q
CD
a
CJ cu
a
<:
o a
CJ
CD
a a!
CJ
a
H CO
CD W
a
Eh
U hh
CD
Eh
CJ J
CD
Eh
CD >
CJ
a
cj
CD
CD CD
CJ
CJ
< H
CJ
Eh
CD >
u
6
a
CD
CJ
CD
CD
CD
CD
CJ
Ph
Eh
6
CA
C Eh
H
CJ
<
H
X!
Eh
cj
&
CD
>
Eh
CJ
CD
>
Eh
CJ
Eh
Pu
o
CD
CD
<
H
O
GC
_
I— i
2*
CJ
CJ
CD
<;
CJ
O
>
CD
O
CD
CD
H
CJ
CJ
Eh
Eh
CD
CJ
CO
H
H
H
CD
CD
>
H
CD
CD
>
U
CJ
"S
CD
CD
CD
CD
Eh
CJ
CJ
1-3
Eh
Eh
CJ
CD
CJ
o
CD
H
CD
CD
>
U
CD
CD
<
Eh
CD
CD
>
CD
CD
CJ
Pi
CJ
CD
CD
<
H
CJ
a
CD
CJ
CD
CJ
CJ
6
Eh
CJ
CD
<
CD
a
CJ
CJ
a
CD
<
CJ
P
Eh
u
CJ
CJ
CD
CD
<;
CJ
CJ
a
Pj
Eh
CJ
H
<
CJ
GGG
Q
CCT
Eh
IS
CD
CD
CD
CD
CD
CJ
CD
CD
H
CJ
CJ
CJ
CJ
^
Eh
<
CD
CD
P
CD
CD
CJ
Pi
Eh
CJ
CJ
CJ
CD
CD
<
Eh
CD
Eh
CJ
CJ
CJ
&
%
1
CCGAC
D
CD
CD
H
CD
CJ
H
<
Eh
CD
CD
CD
CD
CD
CJ
Pi
I
CD
>H
CD
a
CJ
Pi
CD
CD
CJ
H
CD
CJ
CJ
CD
8
CD
a
CJ
CD
CD
U
CJ ft
CJ
CJ
CJ
s
I
Eh
CJ
CJ
H
CJ
CD
3
<
CD
CD
CJ
S
CJ
H CJ
CD
CD
J CJ
CD
a
<
CD
^ Eh
cj
CD
hn* Eh
CJ
<c
O O CD
CD
a
CD
kh CJ
CD
CJ
J CD
CD
CD
O CJ
CD CD
CD
H
CJ
H
h CD
CO
CD
CD
CJ
CJ
CD
cj
CJ
CD
CD
H
Eh
a
CJ
CJ ft
CJ
CJ
Eh
CJ
<
CJ
H
CD
CD
a
CD
CD
CD
Eh
CJ
CD
Eh
CJ
CD
6
CD
CO CD
CJ
CJ
CJ
Eh
H
CD CD CJ
o
CO
CJ
CJ
CD
CJ
l
CJ
<C Eh
<c
cj a
CJ
H
U p-h
CJ
<C
H >h
CD
Eh
CJ h3
U
U
8 <
H
CD >
CD
< 2
CD
Eh
% J
CD
a cd
CJ
CJ
< Eh
CD
CD
CD CD
o
CN
EH
CJ
CD CJ
a
a
CD CD
CD
CJ
a Eh
Eh
CJ
J <
Eh
CJ
H
CD
Eh
CJ
CJ
CO CJ
6
ft CJ
6
cu cj
CD
CD
En
Pi
Pi
CD
u
CD
CJ
Eh
s
Eh
CJ
CJ
Eh
CJ
CJ
CJ
CD
CJ
a
6
co <C
CD
CJ
8
3
cd
CD
CD CD
CD
H
CJ h-H
CJ
_ CD
CD CD
CD
Eh CJ
CD <
CJ
H Eh
<£ H
CJ
CM CD
CD CD
CJ
CD H
CJ J
CD
> CJ
CD <C
CD
ft H
CD >
*£
CO O
CD CD
CJ
< H
Eh
S
CD W
U
En CD
CD CD
CJ
>h CD
CD CD
CJ
H CJ
< Eh
CJ
J CJ j
CD
CJ
Eh CJ
H
CD
Eh Eh
CJ
CD
< a
H
rfj u
Eh
CJ
H CJ
CD
CJ
J CJ
CD
H
J CD
8
<C CJ
8
CD W
CD
W O
CJ Pj
a
H U
U Pj
CD
Q CJ
CD <
U
> ^ ^
CD <
a cd
CD CD
CO
CO
CO
CD
o
O
o
o
o
o
o
o
LD
vo
CO
CN
CN
CN
CN
CJ
CJ
CD
<
Eh
CD
<C
Eh
CD
CJ
CD
Eh
Eh
CJ
<
CJ
CD
CJ
Pj
<
H
Eh
<
CJ
CD
>
<
<
CJ
CJ
a cd
CJ
CJ
Eh
CD
o
CD
<
CJ
Eh
Eh
CJ
CD
CD
CJ
i_q
CD
CJ
s
CJ
Eh
CJ
<
CH
CJ
CJ
8
Q
CD
CD
<C
Eh
CJ
o
CJ
CD
>
CD
a
CD
CD
CD
a
t-f
CJ
CD
CD
CJ
CJ
3
_ -
CD
CD
a
CJ
CJ
CD
CJ
Ph
CJ
CD
3
M
Eh
CD
ri
a
CD
^>
CJ
CD
CD
CJ
CJ
P<
CD
«j
CO
CJ
CJ
CD
Eh
CO
S
<
S
ElJ
n
CD
w
<
CD
CD
CD
CJ
o
<
Eh
<
CD
CD
P4
CJ
8
O Eh
CJ
y
CD
CD
>
CD
<;
CJ
CJ
CD
CD
CJ
CD
<
CD
CJ
CD
CJ
CD
o
<C
CJ
CJ
CJ
PC
CD
CD
CD
CD
CJ
<
to
O
CD
<C
CJ
CD
CD
CD
S
P
CJ
CJ
O
CJ
<
Eh
<
O
CJ
CJ
CJ
PC
Eh
CD
<
Eh
CJ
H
CJ
Eh
Eh
<
CJ
<c
CJ
o
Q
CD
CJ
CD
CJ
CD
Eh
CJ
CD
2
CD
<
CJ
a
<
Cj
CD
CJ
Cj
CJ
CD
<:
Cj
CD
CJ
&
rl
CJ
Pi
CJ
CD
CD
CD
Eh
CO
Eh
CD
CD
CJ
Eh
<
CJ
CD
<ti
Eh
H
Cj
CJ
3
M
CD
CD
CJ
CJ
CD
H
CO
<
CJ
CJ
p
CD
CD
<
<i
CD
CJ
Cj
Q
CD
f3
8
CJ
Pi
CD
Q
Eh
CD
CJ
CD
CJ
,_q
Eh
CJ
CJ
CD
CJ
CD
CJ
Eh
CD
CJ
CD
CJ
CD
CJ
Eh
CJ
CJ
Pi
8
CD
>
CJ
CJ
CD
CD
CD
<;
CJ
CJ
<|
a
< Eh
ss
CD
<
CJ
<c
Cj
CD
Q
CD
CJ
Eh
CJ
CD
CD
CD
O
r*7
>
CJ
CJ
H
a
<I
<
CD
CJ
B
M
CD
CD
<
CJ
6
CO
Eh
o
CJ
CJ
CD
CD
<
CD
U
U
CJ
a
CD
H
CJ
CD
a
CD
>
&H
CO
CJ
CJ
H
P<
U
U
<
s
O
<
H
u
CD
CD
<
Eh
CD
Eh
CD
CJ
hH
CJ
CD
>
B
CD
CD
CJ
CJ
hH
CJ
H
CD
3
CD
>
s
CD
Eh *
CJ
CJ
O Eh
6
Eh
CJ
hH
3
Eh
CD
a
< Eh
CD
CD
a
CJ
CD
CJ
Eh
CO
CJ
rH
tH
<H
tH
O
o
O
O
in
VO
CN
CN
CN
CN
Eh
Eh
a
a
CJ
CD
CJ
H
Eh
Eh H
LD
m
CJ
CD
6
a
CD
u
6
a
<:
CD
CD
S
CJ
CJ
CD
a
a
CD
6
CJ
Eh
CJ
CD
u
CD
CJ
CD
CD
Eh
^
CD
CD
Eh
CD
CJ
Eh
CD
CD
6
CJ
CD
CJ
CJ
CJ
CJ
CD
%
CJ
CJ
<
o
o
a
CD
CD
a
<
a
<
u
CD
CD
CD
CJ
Eh
CJ
CJ
H
CJ
CJ
<c
CD
CJ
CJ
CD
CD
<
<
CJ
CD
CJ
CD
Eh
CJ
CJ
Eh
CJ
Eh
CJ
CJ
CD
Eh
CD
CD
CD
CJ
CD
Cj
CD
CD
£j
CD
CJ
<C
Eh
CJ
<!
CD
CD
Cj
CD
CD
Ph
3
O
CJ
H
o
a
E-t
c
CJ
CD
CJ
CD
<
Eh
o
a
Eh
CD
CJ
6
EH
CJ
CJ
CJ
%
CD
CD
CD
CJ
Eh
CJ
CJ
a
CJ
CJ
EH
CJ
CD
«<
CD
CD
CJ
CJ
Eh
CD
CJ
CD
CJ
CD
CJ
CJ
CJ
CD
a
CD
Eh
Eh
CJ
CJ
Eh
<H
iH
O
O
Q\
O
CN
CO
Fig. 7
Gdh
1
TylA2
1
SgcA
1
MtmE
1
(-> on S3 pn qi i a
l« Wll o ~ s u o
1
Gdh
58
3 O
SgcA
58
L I will kit
59
61
Gdh
118
Tift
no
Grip 1
118
MtmE
1. -1 urn i i
119
121
Gdh
178
TylA2
178
SgcA
178
MtmE
179
consensus
181
Gdh
238
TylA2
238
SgcA
238
MtmE
239
consensus
241
Gdh
298
EEg0RP^DDg3F: 57
PA^EH^GHgDL: 57
PAgHTsSEjAHSgy : 57
TLT^AE3SDSD0F : 58
NL Va prf: 60
GSWpEd pi PNspY A KAgSDLiAlAyHRThGLdV vTR:180
PEKVgP
PEK^P
PEKVjjP
PEKVijP
PLYGDG
PLYGDG
PLYGDG
PLYGDG
sNNYGp Qf PEKvl PLFiTnllDG VPLYGDG n RdWLHV
AES
GRAG
:237
GAG
grQg
:237
MNS
GRAG
:237
RTG
GRAG
:238
GRaG
:240
NIGGGTEL jN|E
NIGGGTELjjNgE
gjlGGGTEL jNgE
nigggtelSnEe
vLe CG dws
[0QPHT
.lrrB a
|cQdrS a
rHryHt
DR
K
GHDRR
SVD
H
331s
DR
P
GHDRR
SVD
T
TKIRgE
DR
Q
GHDRR
s2d
I
TKIRJE
DR
K
GHDRR
SVD
TKIR^E
3EgV:297
\;R:297
B L:297
3A:298
V DR GHDRR Y S vD TKIr ELGY P :300
jSgJPDGGK :329
RSPgGRELERA : 333
Ie^aHlLDAVG- :332
r3y@AVAA :331
consensus 3 01: f egLA Tv WYrdnRaWWePLk a gg :33 6
TylA2
SgcA
MtmE
298:TGITEl
2 98:VAgED|
299 :ReHgD
-9-
S. globisporus CI 027
B S K
I sgcA I
— * 1 ' -rv
0.25-kb 0.75-kb
X
K/B
pBS1012
(4.3-kb)
2.0-kb
B
B
5. globisporus SB1001 C
K/B
1.0-kb
pBS1012
— \\ —
K
6.3-kb
I
j> — w-
-10-
PATENT APPLICATION DECLARATION
(Attorney's Docket No.: 2500.125US2)
Each of the Applicants named below hereby declares as follows:
L My residence, post office address and country of citizenship given below
are true and correct.
2. I believe I am the original, first and joint inventor of the subject matter
which is claimed and for which a patent is sought in the patent application entitled "GENE
CLUSTERFORPRODUCTION OF THE ENEDIYNE ANTITUMOR ANTIBIOTIC C- 1 027, "
Serial No. , filed January 5, 2000, and I have reviewed and understand the
contents of the specification, including its claims.
3. I acknowledge my duty to disclose to the Office all information known to
me to be material to patentability of this application, in accordance with 37 C.F.R. Section 1.56,
which is defined on the attached page.
I further declare that all statements made herein of my own knowledge are true and
that all statements made on information and belief are believed to be true; and further that these
statements were made with the knowledge that willful false statements and the like so made are
punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United States
Code, and that such willful false statements may jeopardize the validity of the application or any
patent issuing thereon.
Date:
Ben Shen
Residence and 1 842 Rushmore Lane
Post Office Address: Davis, California 95616
(Citizenship: People's Republic of China)
Date:
Wen Liu
Residence and Institute of Medicinal Biotechnology
Post Office Address: Tiantan, Beijing, 100005, China
(Citizenship: Peoples Republic of China)
Serial No.: SERIAL NO.
-1-
Date:
Steven D. Christenson
Residence and 1079 Monarch Lane
Post Office Address: Davis, California 95616
(Citizenship: United States)
Date:
Scott Standage
Residence and 63 Tudor Road
Post Office Address: Bornet, Herts, ENS 5NW, United Kingdom
(Citizenship: United Kingdom)
Serial No.: SERIAL NO.
-2-
Section 1.56 Duty to Disclose Information Material to Patentability.
(a) A patent by its very nature is affected with a public interest. The public interest is
best served, and the most effective patent examination occurs when, at the time an application is being
examined, the Office is aware of and evaluates the teachings of all information material to patentability. Each
individual associated with the filing and prosecution of a patent application has a duty of candor and good
faith in dealing with the Office, which includes a duty to disclose to the Office all information known to that
individual to be material to patentability as defined in this section. The duty to disclose information exists
with respect to each pending claim until the claim is cancelled or withdrawn from consideration, or the
application becomes abandoned. Information material to the patentability of a claim that is cancelled or
withdrawn from consideration need not be submitted if the information is not material to the patentability of
any claim remaining under consideration in the application. There is no duty to submit information which
is not material to the patentability of any existing claim. The duty to disclose all information known to be
material to patentability is deemed to be satisfied if all information known to be material to patentability of
any claim issued in a patent was cited by the Office or submitted to the Office in the manner prescribed by
§§ L97(b)-(d) and 1.98. However, no patent will be granted on an application in connection with which
fraud on the Office was practiced or attempted or the duty of disclosure was violated through bad faith or
intentional misconduct. The Office encourages applicants to carefully examine:
(1) prior art cited in search reports of a foreign patent office in a counterpart
application, and
(2) the closest information over which individuals associated with the filing or
prosecution of a patent application believe any pending claim patentably defines, to make sure that
any material information contained therein is disclosed to the Office.
(b) Under this section, information is material to patentability when it is not cumulative
to information already of record or being made of record in the application, and
( 1 ) It establishes, by itself or in combination with other information, a prima facie case
of unpatentability of a claim; or
(2) It refutes, or is inconsistent with, a position the applicant takes in:
(i) Opposing an argument of unpatentability relied on by the Office, or
(ii) Asserting an argument of patentability.
A prima facie case of unpatentability is established when the information compels a conclusion that a claim
is unpatentable under the preponderance of evidence, burden-of-proof standard, giving each term in the claim
its broadest reasonable construction consistent with the specification, and before any consideration is given
to evidence which may be submitted in an attempt to establish a contrary conclusion of patentability.
(c) Individuals associated with the filing or prosecution of a patent application within
the meaning of this section are:
(1) Each inventor named in the application;
(2) Each attorney or agent who prepares or prosecutes the application; and
(3) Every other person who is substantively involved in the preparation or prosecution
of the application and who is associated with the inventor, with the assignee or with anyone to whom
there is an obligation to assign the application.
(d) Individuals other than the attorney, agent or inventor may comply with this section
by disclosing information to the attorney, agent, or inventor.
GENE CLUSTER FOR PRODUCTION OF THE ENEDIYNE
ANTITUMOR ANTIBIOTICC-1027
SEQUENCE LISTING
SEQ ID No. 1. C-1027 gene cluster DNA sequence from 1 to 42,000, ORF-(-7) to ORF-
26
GTCGACTCTAGAGGATCCCGGGTGCGGAGTAGGGGTTACGGACGAAGGAGGGGTGCCCGG
1 + + + + + + 60
CAGCTGAGATCTCCTAGGGCCCACGCCTCATCCCCAATGCCTGCTTCCTCCCCACGGGCC
-7-* *LIGPASYPNRVFSPHG
CGACGCCTGCGGCGAAGGGCGGTTCCTTGAGTTCGAGGCCGGTGGCGAGGACGACGTGGT
61 + + + + + + 120
GCTGCGGACGCCGCTTCCCGCCAAGGAACTCAAGCTCCGGCCACCGCTCCTGCTGCACCA
-7 AVGAAFP PEKLELGTALVVH
CCGCGTCGAGGATCTGCGTGTCGGGGAGCGGCCCAGGGCGCAGCCCCTCGGTCAGGTACG
121 + + + + + + 180
GGCGCAGCTCCTAGACGCACAGCCCCTCGCCGGGTCCCGCGTCGGGGAGCCAGTCCATGC
-7 DADLIQTDPLPGPRLGETLY
GGGTGAGGCCCCTGACGGTCACCTCGAAGCAGCGGTCGTGGGACCGGGCGTCGAGCGCCT
181 + + + + + + 240
CCCACTCCGGGGACTGCCAGTGGAGCTTCGTCGCCAGCACCCTGGCCCGCAGCTCGCGGA
-7 PTLGRVTVEFCRDHSRADLA
CCCCGTCCGCTTCCACAAGGACGACGCCGGGACAGGACTCCCGTGCGGCCTCGACCAGTC
241 + + + + + + 300
GGGGCAGGCGAAGGTGTTCCTGCTGCGGCCCTGTCCTGAGGGCACGCCGGAGCTGGTCAG
-7 EGDAEVLVVGPCSERAAEVL
GGGCGTCGAGGTAGTCCTGGAAGATGCGGCGGGGGGCGGGGCCCTGTTCGGTGAACTTCC
301 + + + + + + 360
CCCGCAGCTCCATCAGGACCTTCTACGCCGCCCCCCGCCCCGGGACAAGCCACTTGAAGG
-7 RADLYDQFIRRPAPGQETFK
ACGAAGCCCAGCGCCGGGGCCAGTCGCGCCGGTCGGCCTCCTGGTTGGCCCAGTTGATGA
361 + + + + + + 420
TGCTTCGGGTCGCGGCCCCGGTCAGCGCGGCCAGCCGGAGGACCAACCGGGTCAACTACT
-7 WSAWRRPWDRRDAEQNAWNI
AGTCGAGCACGTCCTCGCGGAACACCGACATCCTGCCGGCCTGGATATTGAAGACGTGGT
421 + + + + + + 480
TCAGCTCGTGCAGGAGCGCCTTGTGGCTGTAGGACGGCCGGACCTATAACTTCTGCACCA
-7 FDLVDERFVSMRGAQINFVH
CCCAGGGGTTGCCGTCACGGTGATAGGCGACGCCGGCCGAGCGGTAgGCGGCGCGCCGCT
481 + + + + + + 540
GGGTCCCCAACGGCAGTGCCACTATCCGCTGCGGCCGGCTCGCCATcCGCCGCGCGGCGA
-7 DWPNGDRHYAVGASRYAARR
CCAGGAGGACGACTTCCAGCGGTCTTCTCGCGAAATGAAGCAGGCGTATCGCGGTCGCCG
541 + + + + + + 600
GGTCCTCCTGCTGAAGGTCGCCAGAAGAGCGCTTTACTTCGTCCGCATAGCGCCAGCGGC
-7 ELLVVELPRRAFHLLRIATA
TGCCTGCCAGGCCCGCCCCTACGACCAGCACCCTGGGGCGCGCACCCGTCATGCCCATGA
601 + + + + + + 660
ACGGACGGTCCGGGCGGGGATGCTGGTCGTGGGACCCCGCGCGTGGGCAGTACGGGTACT
-7-< TGALGAGVVLVRPRAGTMGM
1
AGCCTCCCCCGCTGACTCAGGGCGgCGCGTCGCGCGCTCCCGTCGGTGTCCTCGCTGACT
661 + + + + + + 720
TCGGAGGGGGCGACTGAGTCCCGCcGCGCAGCGCGCGAGGGCAGCCACAGGAGCGACTGA
GGAAGTTCCCTGACCTGGCGTCAACTCCACTGATCCGTAAGGGGATCGCGGGAGTGGATA
721 + + + + + + 780
CCTTCAAGGGACTGGACCGCAGTTGAGGTGACTAGGCATTCCCCTAGCGCCCTCACCTAT
CGGGTCAGGTCGTGCACGATCGTGGCACCAGACAGATCACCACGTCGATAGGCACTCGTG
781 + + + + + + 840
GCCCAGTCCAGCACGTGCTAGCACCGTGGTCTGTCTAGTGGTGCAGCTATCCGTGAGCAC
AGCCGCGCCCGGGGCTCGACGGGGCGGGGCACCGGCAGGGGCGGCCGCGTGATCAGCCGG
841 + + + + + + 900
TCGGCGCGGGCCCCGAGCTGCCCCGCCCCGTGGCCGTCCCCGCCGGCGCACTAGTCGGCC
AGCCTGTCCGGGGGCGTGCGTGCGGGGCGTCAGCTGTCGATGTCGGGAACGCCAGGGACG
901 + + + + + + 960
TCGGACAGGCCCCCGCACGCACGCCCCGCAGTCGACAGCTACAGCCCTTGCGGTCCCTGC
6-* *SDIDPVGPV-
TCGATCTCGGTGCGGGCGTAGTGGTTGAAGTAGTTGGTGTAGAGGTTCACGGCCACGTGG
961 + + + + + + 1020
AGCTAGAGCCACGCCCGCATCACCAACTTCATCAACCACATCTCCAAGTGCCGGTGCACC
6 D I ETRAYHNFYNTYLNVAVH
ACGAAGACCTCGGCGAGCTCGGTGTCCGTCCATCCCTGTGCCACGGCCGCGTTCCACGAG
1021 4- + + + + + 1080
TGCTTCTGGAGCCGCTCGAGCCACAGGCAGGTAGGGACACGGTGCCGGCGCAAGGTGCTC
6 VFVEALETDTWGQAVAANWS
GCGTCAGACGCCTCGCCCACTTCGCCGGCGATCTCCCTGGCCACCTGGACCAGTGCTTCG
1081 + + + + + + 1140
CGCAGTCTGCGGAGCGGGTGAAGCGGCCGCTAGAGGGACCGGTGGACCTGGTCACGAAGC
6 ADSAEGVEGAIERAVQVLAE-
AGCTTCACGTCGTCGCCGGGCGTCCCCCGGCGAATCGCCACGGTCTCCTCCAGCGTGAAA
1141 + + + + + + 1200
TCGAAGTGCAGCAGCGGCCCGCAGGGGGCCGCTTAGCGGTGCCAGAGGAGGTCGCACTTT
6 LKVDDGPTGRRIAVTEELTF-
CCCGCGACCTTCGCCGACACCGTGTGCGCCGCCTGGCAGTACGCGCACGCGTCGACCGCG
1201 + + + + + + 1260
GGGCGCTGGAAGCGGCTGTGGCACACGCGGCGGACCGTCATGCGCGTGCGCAGCTGGCGC
6 GAVKASVTHAAQCYACADVA
CCCACGGCGAGGGCGATCGCCTCGCGTGTGCGGGCGTCGAACGTTCCATGTTCGGCGACG
1261 + + + + + + 1320
GGGTGCCGCTCCCGCTAGCGGAGCGCACACGCCCGCAGCTTGCAAGGTACAAGCCGCTGC
6 GVALAIAERTRADFTGHEAV
GCTCCGGTGATCGCGGCGTAGGTTTCCAGGACCACGGGGGAATGGGCCATTCCCCCGTGG
1321 + + + + + + 1380
CGAGGCCACTAGCGCCGCATCCAAAGGTCCTGGTGCCCCCTTACCCGGTAAGGGGGCACC
6 AGTIAAYTELVVPSHAMGGH-
ATGTTGAGCACTCGCCCGAACCGCTTCTCCAGTCGGCGCAGGATGTCTCCGCCGGCTGCG
1381 + + + + + + 1440
TACAACTCGTGAGCGGGCTTGGCGAAGAGGTCAGCCGCGTCCTACAGAGGCGGCCGACGC
6 INLVRGFRKELRRLIDGGAA-
GGTGCGGTGTCGATGGTGTGGACGGGAATCCGCGGCATGGGAATGCCTCTCCTCGTAGTG
1441 + + + + + + 1500
CCACGCCACAGCTACCACACCAGCCCTTAGGCGCCGTACCCTTACGGAGAGGAGCATCAC
6-< PATDITHVPIRPM
2
ATGGGAGTTCCTCGTCCCTCCAGTCTGCCCAAGCACCTCCCCCGGTGAGCTGTCCCGGCC
1501 + + + + + + 1560
TACCCTCAAGGAGCAGGGAGGTCAGACGGGTTCGTGGAGGGGGCCACTCGACAGGGCCGG
GCCCTCCGGCCCCTTCTAGGCAGGTCGCCCGGTGGTGCGGCCCCAGGACGTCACCTCGCC
1561 + + + + + + 1620
CGGGAGGCCGGGGAAGATCCGTCCAGCGGGCCACCACGCCGGGGTCCTGCAGTGGAGCGG
GCACCACCGGGAGCCCCGAGGGGCGAGGTCAGAGGCCGAGCACCTCCTCGGCCAGGGCGG
1621 + + + + + + 1680
CGTGGTGGCCCTCGGGGCTCCCCGCTCCAGTCTCCGGCTCGTGGAGGAGCCGGTCCCGCC
-5-* * LGLVEEALA
TGCCCCGAACACGGGCCTCGATCTTGGCGAAGGCCAGGTCGCGTGTGGTGGAGGTGTCGT
1681 + + + + + + 1740
ACGGGGCTTGTGCCCGGAGCTAGAACCGCTTCCGGTCCAGCGCACACCACCTCCACAGCA
-5 TGRVRAE I KAFALDRTTSTD
CGGCGAACGGGGAGAAGCCGCAGTCGTCGCAGGTTCCCAGTTGCTCGACGGGGATGTAGC
1741 + + + + + + 1800
GCCGCTTGCCCCTCTTCGGCGTCAGCAGCGTCCAAGGGTCAACGAGCTGCCCCTACATCG
-5 DAFPSFGCDDCTGLQEVPIY
GGGCGGCGAGCAGGATGCGGTCGCGTACCTGCTCGGGGGTCTCGACCACTGGGTCGATCG
1801 + + + + + + 1860
CCCGCCGCTCGTCCTACGCCAGCGCATGGACGAGCCCCCAGAGCTGGTGACCCAGCTAGC
-5 RAALLIRDRVQEPTEVVPDI
GGTCGGTCACCCCGAGGAAGACGCGGGCGGCAGGGGGCAGGTGGTCACGGACGATGCTCA
1861 + + + + + + 1920
CCAGCCAGTGGGGCTCCTTCTGCGCCCGCCGTCCCCCGTCCACCAGTGCCTGCTACGAGT
-5 PDTVGLFVRAAPPLHDRVIS
GGACCCGCTCGGGGTCCGCTTCGCCGGCCAGTTCGAGATAGAAGTTGCCCGCCTTGAGCT
1921 + + 4- + + + 1980
CCTGGGCGAGCCCCAGGCGAAGCGGCCGGTCAAGCTCTATCTTCAACGGGCGGAACTCGA
-5 LVREPDAEGALELYFNGAKL
GGAAGAGCTTGGGCAGCAGTTCGGCGTAGTCGATGTCGAGGCTGTGCGTGGAGTCCTGGT
1981 + + + + + + 2040
CCTTCTCGAACCCGTCGTCAAGCCGCATCAGCTACAGCTCCGACACGCACCTCAGGACCA
-5 QFLKPLLEAYDIDLSHTSDQ
CGCCGCCGGGGCAGGTGTGTACGCCGATGCGGGCGGTTTCCTCGGCGCTGAAGCGCCCCA
2041 + + + + + + 2100
GCGGCGGCCCCGTCCACACATGCGGCTACGCCCGCCAAAGGAGCCGCGACTTCGCGGGGT
-5 DGGPCTHVGIRATEEASFRG
GGACTTCGTTGTTGAGGGCGATGAAGTCGTCGAGGACGCCGCCGCTGGGGTCGAGCTTGA
2101 + + + + + + 2160
CCTGAAGCAACAACTCCCGCTACTTCAGCAGCTCCTGCGGCGGCGACCCCAGCTCGAACT
-5 LVENNLAI FDDLVGGS PDLK
GGGACAGCCGCCCCTCGGTGAAGTCGAGCTGGACCACGTGTGCCCCCGCGTCCAGGCAGC
2161 + + + + + + 2220
CCCTGTCGGCGGGGAGCCACTTCAGCTCGACCTGGTGCACACGGGGGCGCAGGTCCGTCG
-5 LSLRGETFDLQVVHAGADLC
CTCGGATGTCGGCTTCGGCCTCGTCGGCGAGGTCGCGCAGGAACTGCTCGCGGGGGTAGC
2221 + + + + + + 2280
GAGCCTACAGCCGAAGCCGGAGCAGCCGCTCCAGCGCGTCCTTGACGAGCGCCCCCATCG
-5 GRIDAEAEDALDRLFQERPY
CCTCGATGGGAGTGGCGGGGTAGAGGAGGCTGAGGGCGGAGGGTGCGATGACCGCCTGCT
2281 + + + + + + 2340
3
GGAGCTACCCTCACCGCCCCATCTCCTCCGACTCCCGCCTCCCACGCTACTGGCGGACGA
-5 GE I PTAPYLLSLAS PAIVAQ
TCAGGGGGCGGTCCGTGAGCTGCCGTGCGGCGCGCAGATAGGTTTCGGCCCGCACCTGGT
2341 + + + + + + 2400
AGTCCCCCGCCAGGCACTCGACGGCACGCCGCGCGTCTATCCAAAGCCGGGCGTGGACCA
-5 KLPRDTLQRAARLYTEARVQ
AGCGGAAGGGCCCTTGGGTGATGCTGGGGAGCTGCCGGGTGTGCCCGTCTGCGAAGGGGA
2401 + + + + + + 2460
TCGCCTTCCCGGGAACCCACTACGACCCCTCGACGGCCCACACGGGCAGACGCTTCCCCT
-5 YRFPGQTI SPLQRTHGDAFP
TGACAGCGCCGTCGGGCGAGAGGGTGTCGAGGCCGGTCACGGGGTAGGTGGCGAAGCTCG
2461 + + + + + + 2520
ACTGTCGCGGCAGCCCGCTCTCCCACAGCTCCGGCCAGTGCCCCATCCACCGCTTCGAGC
-5 IVAGDPSLTDLGTVPYTAFS
GCTTGGACTGTTCACCGTCCACGAGGACGGGGCTGCCGACTCGTTCCAGTCGTGTCAGGG
2521 + + + + + + 2580
CGAACCTGACAAGTGGCAGGTGCTCCTGCCCCGACGGCTGAGCAAGGTCAGCACAGTCCC
-5 PKSQEGDVLVPSGVRELRTL
TGTCCGCGACGGCCTGTTCCTGCTGTTTGGCCAGGTCCGTGGCGTCCAGGGTTCCCTGGG
2581 + + + + + + 2640
ACAGGCGCTGCCGGACAAGGACGACAAACCGGTCCAGGCACCGCAGGTCCCAAGGGACCC
-5 TDAVAQEQQKALDTADLTGQ
CATGCGCGGCAAGGGCGTGCAGGAGTGTCGCGGAGCGCGGAAGGCTGCCGATCGGCTCAG
2641 + + + + + + 2700
GTACGCGCCGTTCCCGCACGTCCTCACAGCGCCTCGCGCCTTCCGACGGCTAGCCGAGTC
-5 AHAALAHLLTASRPLSGIPE
TGGCGATGGTCATGGCCGAAGAGTAGGGAAGAGGCTGGGTTTCGAACCACCGCAAAGCTT
2701 + + + + + + 2760
ACCGCTACCAGTACCGGCTTCTCATCCCTTCTCCGACCCAAAGCTTGGTGGCGTTTCGAA
-5-< T A I T M -
TGATTGCCGCTTTTTCAGGGGAAGTTGATGCGAAGTCGCCGAGCGGCGGAACGTGCTGAT
2761 + + + + + + 2820
ACTAACGGCGAAAAAGTCCCCTTCAACTACGCTTCAGCGGCTCGCCGCCTTGCACGACTA
GTATGGGGGGCGGGAGGAGCCTGCGGGGTTCTAGGAGCCGGTCGCGGCCACGGTGGAGGA
2821 + + + + + + 2880
CATACCCCCCGCCCTCCTCGGACGCCCCAAGATCCTCGGCCAGCGCCGGTGCCACCTCCT
-4-* *SGTAAVTSS-
GGTGCCCAGCTGGGAGCGGGGGGTCTTTTCGCCGACGCGGTTGGGCTCGATGGTGCGGGG
2881 + + + + + + 2940
CCACGGGTCGACCCTCGCCCCCCAGAAAAGCGGCTGCGCCAACCCGAGCTACCACGCCCC
-4 TGLQSRPTKEGVRNPEITRP-
GTCGACGGCCTCTCCGGGGGCACCTTGCCGGTAGACGCCTTCGGGGTCGGAGTCCCGGTC
2941 + + + + + + 3000
CAGCTGCCGGAGAGGCCCCCGTGGAACGGCCATCTGCGGAAGCCCCAGCCTCAGGGCCAG
-4 DVAEGPAGQRYVGE PDSDRD-
ATGGGGGAGCAGGAAGAAGACCCGGCGCCGGTACAGACCGCTGTCCGGGTCCGCTTCGGC
3001 + + + + + + 3060
TACCCCCTCGTCCTTCTTCTGGGCCGCGGCCATGTCTGGCGACAGGCCCAGGCGAAGCCG
-4 HPLLFFVRRRYLGSDPDAEA-
GTCGGCCCCGAGTTCGATGTAGCCGATCATGCGGCCGTCGCGGGCGTAGCGCGGCTTGTT
3061 + + + + + + 3120
CAGCCGGGGCTCAAGCTACATCGGCTAGTACGCCGGCAGCGCCCGCATCGCGCCGAACAA
-4 DAGLE IYGIMRGDRAYRPKN-
4
CTTGCGCCGGGGGGTCTTGTCCAGGGCCTGGCGGACGTAGTCGAGTCCCTCGGGATCTTC
3121 + + + + + + 3180
GAACGCGGCCCCCCAGAACAGGTCCCGGACCGCCTGCATCAGCTCAGGGAGCCCTAGAAG
-4 KRRPTKDLAQRVYDLGEPDE-
GAGCCACACGACCTTCGCCTCGTGAACGAGATCGCTGTCGGTCAGTAGCGAGCTCATGGC
3181 + + + + + + 3240
CTCGGTGTGCTGGAAGCGGAGCACTTGCTCTAGCGACAGCCAGTCATCGCTCGAGTACCG
-4-< LWVVKAEHVLDSDTLLS S M -
GGCGACCTCTCCTTCGTCGGCGTGCACCGGGTGGGGAAGCGGTGCCTGCGTGATGTGTGT
3241 + + + + + + 3300
CCGCTGGAGAGGAAGCAGCCGCACGTGGCCCACCCCTTCGCCACGGACGCACTACACACA
TCGTCTGCGGCGGTGGGCCGCAGTGGTGCGGACCGCCCGTGGTGCCGGTTCTCGGCCAAA
3301 + + + + + + 3360
AGCAGACGCCGCCACCCGGCGTCACCACGCCTGGCGGGCACCACGGCCAAGAGCCGGTTT
GCACGGGCAGGTACGTCCTGGGGCACTCACATCGTAGATGGGGTCCGCTTCCGCAGGGCA
3361 + + + + + + 3420
CGTGCCCGTCCATGCAGGACCCCGTGAGTGTAGCATCTACCCCAGGCGAAGGCGTCCCGT
GTGCCTCCGGTCGGAGGACGTTCATTCGTCGGCTGCCAGAGCGAGGTTGGGGTAGAACTT
3421 + + + + + + 3480
CACGGAGGCCAGCCTCCTGCAAGTAAGCAGCCGACGGTCTCGCTCCAACCCCATCTTGAA
-3-* *EDAALALNPYFK-
CCGGCCGTTGGATTTGATCATGTCGGCAGGTGAGGCGAGGCCCACTTCCTGGCGGACCCG
3481 + + + + + + 3540
GGCCGGCAACCTAAACTAGTACAGCCGTCCACTCCGCTCCGGGTGAAGGACCGCCTGGGC
-3 RGNS KIMDAPSALGVEQRVR-
GGTGGCGAAGGCACGGGCGGTCCCGGGGCGGATGCCTTCACTGTGTGCGCACCAGGTGCT
3541 + + + + + + 3600
CCACCGCTTCCGTGCCCGCCAGGGCCCCGCCTACGGAAGTGACACACGCGTGGTCCACGA
-3 TAFARATGPRIGESHACWTS-
GTAGGACGTGTAGAGAAGGCCCTGTTCGACGCGTAGCTCGCTGTTCTCGGGGTCGTGGAG
3601 + + + + + + 3660
CATCCTGCACATCTCTTCCGGGACAAGCTGCGCATCGAGCGACAAGAGCCCCAGCACCTC
-3 YSTYLLGQEVRLESNE PDHL-
GCAGCACTCGGCGAGGAAGCGGCCGATGTGGTCCTCGGTGTTCGCGTATGCGCTGGTGGC
3661 + + + + + + 3720
CGTCGTGAGCCGCTCCTTCGCCGGCTACACCAGGAGCCACAAGCGCATACGCGACCACCG
-3 CCEALFRG I HDE TNAYASTA-
GATGCGGACCCGGTCGGGGCCGGCGAGTGTGTCGCGGGTGGCGAGGTAGCGGCGGGCCCC
3721 + + + + + + 3780
CTACGCCTGGGCCAGCCCCGGCCGCTCACACAGCGCCCACCGCTCCATCGCCGCCCGGGG
-3 I RVRD PGALTDRTALYRRAG-
TTCGGTGAGCCAGTGCAGGATCCCGGGGCCCTCGTCCTGGACGAGTTCGACAGCCAGGTT
3781 + + + + + + 3840
AAGCCACTCGGTCACGTCCTAGGGCCCCGGGAGCAGGACCTGCTCAAGCTGTCGGTCCAA
-3 ETLWHLIGPGEDQVLEVALN-
GTCGATCTTGCGTTCGTCGGGGACGATCCGTTCGAAGGGCAGGAGGCGGATGCGGCGCCA
3841 + + + + + + 3900
CAGCTAGAACGCAAGCAGCCCCTGCTAGGCAAGCTTCCCGTCCTCCGCCTACGCCGCGGT
-3 DIKREDPVIREFPLLRIRRW-
GAAGGCGAAGCCGCCGGTGGAGACCTCGGGGCGGTGGTTGCCCAGCAGCCACAGCTTGTG
3901 + + + + + + 3960
CTTCCGCTTCGGCGGCCACCTCTGGAGCCCCGCCACCAACGGGTCGTCGGTGTCGAACAC
5
-3 FAFGGTSVE PRHNGLLWLKH-
CGTGGGTGTGAAGGAGAAATAGTCCTGCCGCATGCGGCGGGCCTTGATCTTGTCACCGCC
396 1 + + + + + + 4020
GCACCCACACTTCCTCTTTATCAGGACGGCGTACGCCGCCCGGAACTAGAACAGTGGCGG
-3 TPTFS FYDQRMRRAKI KDGG-
GGTCAGCAGGCGGACGCGCGCCTCGTCGAAGCGGTCGTTGGGCTTGAGCTCGCTGCACAC
4021 + + + + + + 4080
CCAGTCGTCCGCCTGCGCGCGGAGCAGCTTCGCCAGCAACCCGAACTCGAGCGACGTGTG
-3 TLLRVRAEDFRDNPKLESCV-
GATGAGGCGGCGGCCGTGGAGTTCGGTGAGCTCGGTGGAGTGTTCGGAGTATGCGCCACG
4081 + + + + + + 4140
CTACTCCGCCGCCGGCACCTCAAGCCACTCGAGCCACCTCACAAGCCTCATACGCGGTGC
-3 I LRRGHLETLETSHESYAGR-
GTCCATGAGGAAACCCGGCGGGGCTGCGTCGGCGTAGTCGCCGAGAATCTGGATCATCAC
4141 + + + + + + 4200
CAGGTACTCCTTTGGGCCGCCCCGACGCAGCCGCATCAGCGGCTCTTAGACCTAGTAGTG
-3 DMLFGPPAADAYDGLIQIMV-
GTCGAGGAGAACGGATTTGCCGTTCTTTCCCTGGCCGTGGAGAAAGGGCAGCACCTGCGC
4201 + + + + + + 4260
CAGCTCCTCTTGCCTAAACGGCAAGAAAGGGACCGGCACCTCTTTCCCGTCGTGGACGCG
-3 DLLVS KGNKGQGHLF P LVQA-
CCCGACGTCACCGGTGATGGAGTAGCCGAGAAGGAGGTGGAGGAAGTCGATCATCTCCCG
4261 + + + + + + 4320
GGGCTGCAGTGGCCACTACCTCATCGGCTCTTCCTCCACCTCCTTCAGCTAGTAGAGGGC
-3 GVDGTI SYGLLLHLFDIMER-
CCCTTCGGCGTCACTGCCGAAGGTGTCTTCGAGGAAACGGTGCCAGCGGGGGGTGGGGAT
4321 + + + + + + 4380
GGGAAGCCGCAGTGACGGCTTCCACAGAAGCTCCTTTGCCACGGTCGCCCCCCACCCCTA
-3 GEADSGFTDELFRHWRPTPI-
GTCCTGGGGGGAGGCGCTGGTGGCGCGGGAGTGGAAGTCCCGGGTGGGGTCGGGCTTGCG
4381 + + + + + + 4440
CAGGACCCCCCTCCGCGACCACCGCGCCCTCACCTTCAGGGCCCACCCCAGCCCGAACGC
-3 DQ P SASTARS HFDRT PD PKR-
CATACGGCCGTTGCGGAGGTCGACCACTCCGTCAGGGGTGCACAGGGCGTAGGGGTCTCC
4441 + + + + + + 4500
GTATGCCGGCAACGCCTCCAGCTGGTGAGGCAGTCCCCACGTGTCCCGCATCCCCAGAGG
-3 MRGNRLDVVGDPTCLAYPDG-
GTCGAGGGTGTCGGGATCGAGGGAGAGGTCGGGAGAGGCCTTTGCCTGGGTGAGGAGCGC
4501 + + + + + + 4560
CAGCTCCCACAGCCCTAGCTCCCTCTCCAGCCCTCTCCGGAAACGGACCCACTCCTCGCG
-3 DLTD PDL S LD P SAKAQTLLA-
CTTCATACCGGTCGTCGACAGGGTGCGGCGTTTGTGGTGGTGCAGTTCCCGGTCGGTGAA
4561 + + + + + + 4620
GAAGTATGGCCAGCAGCTGTCCCACGCCGCAAACACCACCACGTCAAGGGCCAGCCACTT
-3 KMGTTSLTRRKHHHLERDTF-
CAGCCCGCGGGGATCGCTGCCGGGCATCTCCTCCGCCATCTCTCCGGCAGCCCACAGGGC
4621 + + + + + + 4680
GTCGGGCGCCCCTAGCGACGGCCCGTAGAGGAGGCGGTAGAGAGGCCGTCGGGTGTCCCG
-3 LGRPDSGPMEEAMEGAAWLA-
AGCTTTCTCGCCTCCGGCCCGCTTCCACCGGTAGCCGTCCCAGGAGTACCAGCCCAGGCC
4681 + + + + + + 4740
TCGAAAGAGCGGAGGCCGGGCGAAGGTGGCCATCGGCAGGGTCCTCATGGTCGGGTCCGG
6
3 AKEGGARKWRYGDWSYWGLG-
CTCCACGTGCCGGAACTGGTCACGGTAGAGACGGACGAAGAGCTTGGCGTTGCCGCGGTC
4741 + + + + + + 4800
GAGGTGCACGGCCTTGACCAGTGCCATCTCTGCCTGCTTCTCGAACCGCAACGGCGCCAG
3 EVHRFQDRYLRVFLKANGRD-
GGTCAGGCTGGCGGGAATCTCGCCCGCCTCCCAGGCGGTCGCGGCGACGGGGGCCTCGGG
4801 + + + + + + 4860
CCAGTCCGACCGCCCTTAGAGCGGGCGGAGGGTCCGCCAGCGCCGCTGCCCCCGGAGCCC
3 TLSAP I EGAEWATAAVPAEP-
AGCGGCCTGGACAGGGAGGAGCGGCGCTGGGGCCGGGGTGGTTTCGAGGGCCAGCATCTG
4861 + + + + + + 4920
TCGCCGGACCTGTCCCTCCTCGCCGCGACCCCGGCCCCACCAAAGCTCCCGGTCGTAGAC
3 AAQVPLLPAPAPTTELALMQ-
CTGAGCGGCGGCAGTTGCGTCAAAGCGAGGGCCCTCGGCGCTGCTGCTCATGGACGTCCT
4921 + + + + + + 4980
GACTCGCCGCCGTCAACGCAGTTTCGCTCCCGGGAGCCGCGACGACGAGTACCTGCAGGA
3-< QAAATADFRPGEAS S S M -
TCGAGATGGAGCGGTCGGGCGGTCCCCGCTGCGGGAACGGCATGAATGATCTTCCCGGTG
4981 + + + + + + 5040
AGCTCTACCTCGCCAGCCCGCCAGGGGCGACGCCCTTGCCGTACTTACTAGAAGGGCCAC
CGGACAGAGTGCCAGGGGCAGCGCATGTGCGGGGGGACAACGGCCCGTTTCGGACGAGGG
5041 + + + + + + 5100
GCCTGTCTCACGGTCCCCGTCGCGTACACGCCCCCCTGTTGCCGGGCAAAGCCTGCTCCC
CCGGCCGACGGGGGGAAGCAGGGGCCGGCAACCGGGTGGCGGGGCGGCGTGAGCGAGGGC
5101 + + + + + + 5160
GGCCGGCTGCCCCCCTTCGTCCCCGGCCGTTGGCCCACCGCCCCGCCGCACTCGCTCCCG
ACGAGCGGCCCGGTACGGGGGGAAGGGCTCGTCTCTCCGTGGGGCGGCACGTTGTGGTCC
5161 + + + + + + 5220
TGCTCGCCGGGCCATGCCCCCCTTCCCGAGCAGAGAGGCACCCCGCCGTGCAACACCAGG
TCGTCCGTCAGCTTGCGTCTGGCTTCAGCCTCCTGACCCCCAATAAGGCGAAAGCTGCTG
5221 + + + + + + 5280
AGCAGGCAGTCGAACGCAGACCGAAGTCGGAGGACTGGGGGTTATTCCGCTTTCGACGAC
GTCAAGCATCTTTCGTGACACTCGGCGAGGGACTGAAGGGACTGTCTTTCGGAATGAGTG
5281 + + + + + + 5340
CAGTTCGTAGAAAGCACTGTGAGCCGCTCCCTGACTTCCCTGACAGAAAGCCTTACTCAC
TAGGGGGTTGTCGGGTGGGGACCGCGCCTCGACTCCCCGGCGGACGGGATCTGTTCGGTC
5341 + + + + + + 5400
ATCCCCCAACAGCCCACCCCTGGCGCGGAGCTGAGGGGCCGCCTGCCCTAGACAAGCCAG
GGTCCCTTGGGTCCCTCCCCGGATCGCGGCAGGGACCCAAGGGGGCGGTGCGGCGGGCGG
5401 + + + + + + 5460
CCAGGGAACCCAGGGAGGGGCCTAGCGCCGTCCCTGGGTTCCCCCGCCACGCCGCCCGCC
TCGGTGAGGGGCCCCGGTGGAGGGACTGAGGGTCTGTATGGAGCGATAAGAGGGTCTGAA
5461 + + + + + + 5520
AGCCACTCCCCGGGGCCACCTCCCTGACTCCCAGACATACCTCGCTATTCTCCCAGACTT
GGGGCGGAGAGAGTTTCGGTCCCTGCGTTGAGTCCCTGGTCATCACCGCAGGTCAGAGGG
5521 + + + + + + 5580
CCCCGCCTCTCTCAAAGCCAGGGACGCAACTCAGGGACCAGTAGTGGCGTCCAGTCTCCC
GTTTTGAGGGGTGAAAAAGGGACTGAAGGGACTCAACTTCCCCATTATGAGCTGAGTAGA
5581 + + + + + + 5640
CAAAACTCCCCACTTTTTCCCTGACTTCCCTGAGTTGAAGGGGTAATACTCGACTCATCT
7
AGAAAGCAGTATGACGATATCGGCGCCTACATACGCGCGCGTACATAGTGAGCTTATAAT
5641 + + + + + + 5700
TCTTTCGTCATACTGCTATAGCCGCGGATGTATGCGCGCGCATGTATCACTCGAATATTA
GCGGAAGTTGAGTCCCTTCAGTCCCTTTTCGTGGGGTCGTATCCCCTCTGACTGCGTTGA
5701 + + + + + + 5760
CGCCTTCAACTCAGGGAAGTCAGGGAAAAGCACCCCAGCATAGGGGAGACTGACGCAACT
CCGTCGCCGCTCCGCGCAGGGACCGAAGAGGGACCAAGTCCCTGCGCGGGGCGGGCGACG
5761 + + + + + + 5820
GGCAGCGGCGAGGCGCGTCCCTGGCTTCTCCCTGGTTCAGGGACGCGCCCCGCCCGCTGC
GTAATCGTGCAGTGCCCCCTCCCCCGTTTCCCACAGCGAGTCGTCGCTCCCCTGTGAGGC
5821 + + + + + + 5880
CATTAGCACGTCACGGGGGAGGGGGCAAAGGGTGTCGCTCAGCAGCGAGGGGACACTCCG
CGGAGAGGGTCCTAGAACCCCTCAGGGGCCGTTCTGTGGCCCTCTGGGCCTCCTCCTGGC
5881 + + + + + + 5940
GCCTCTCCCAGGATCTTGGGGAGTCCCCGGCAAGACACCGGGAGACCCGGAGGAGGACCG
CATTTACCCCATGGGGGCGCTTGGGGGCGTCAGGAGGGCTTGTGAGGGCTCTGCCGGGAA
5941 + + + + + + 6000
GTAAATGGGGTACCCCCGCGAACCCCCGCAGTCCTCCCGAACACTCCCGAGACGGCCCTT
_ 2 -> MRALPGS-
GTGGCGGATTGCGCATGGCAGGAGATGCCCCGACAGCGGCCGGGAATCGACGATGTCCCC
600 1 + + + + + + 6060
CACCGCCTAACGCGTACCGTCCTCTACGGGGCTGTCGCCGGCCCTTAGCTGCTACAGGGG
-2 GGLRMAGDAPTAAGNRRCPP-
CGACCCCTATCCAGCGTCCGCTGATCCTCAGGAGGCAGACCTTGCAGGCTCCAGAAGCGA
60 61 + + + + + + 6120
GCTGGGGATAGGTCGCAGGCGACTAGGAGTCCTCCGTCTGGAACGTCCGAGGTCTTCGCT
-2 TP I QRPL ILRRQTLQAPEAK-
AGAACGGCCGGTCCCCGGAGCAGCCGCAGGAAGAGCGGATCGTCCTGGACGTATGGCTGG
6121 + + + + + + 6180
TCTTGCCGGCCAGGGGCCTCGTCGGCGTCCTTCTCGCCTAGCAGGACCTGCATACCGACC
-2 NGRSPEQPQEERIVLDVWLA-
CGAACTACCCGTTCCCCACCTATGACGGGCGTGACTTCCTCGCTCCGCTGCGCGAGCGGG
6181 + + + + + + 6240
GCTTGATGGGCAAGGGGTGGATACTGCCCGCACTGAAGGAGCGAGGCGACGCGCTCGCCC
-2 NYPFPTYDGRDFLAPLRERA-
CGGCGGAGTTCGAGCGCGCCCACCCCCGATACCGGGTCGACATCAACGGCCACGACTTCT
6241 + + + + + + 6300
GCCGCCTCAAGCTCGCGCGGGTGGGGGCTATGGCCCAGCTGTAGTTGCCGGTGCTGAAGA
-2 AEFERAHPRYRVDINGHDFW-
GGACCATCCCCGAGAAGGTGGCGCGCGCCACCGCGGAGGGCAGGCCTCCGCACATAGCGG
6301 + + + + + + 6360
CCTGGTAGGGGCTCTTCCACCGCGCGCGGTGGCGCCTCCCGTCCGGAGGCGTGTATCGCC
-2 TIPEKVARATAEGRPPHIAG-
GCTACTACGCCACCGACAGCCAGTTGGCGCGGGACGCGCGCAGGCCCGACGGGAAGCCGG
6361 + + + + + + 6420
CGATGATGCGGTGGCTGTCGGTCAACCGCGCCCTGCGCGCGTCCGGGCTGCCCTTCGGCC
-2 YYATDS QLARDARR PDGKPV-
TCTTCACCTCGGTGGAGGCCGCGTTGGCCGGCCGGACGGAGATACTGGGACACCCGGTGG
6421 + + + + + + 6480
AGAAGTGGAGCCACCTCCGGCGCAACCGGCCGGCCTGCCTCTATGACCCTGTGGGCCACC
-2 FT SVEAALAGRTE I LGH PVV-
TGGTGGAGGACCTCGACCCCGTGGTGCGCGACTCCTACTCGTTCGGGGGCGAGTTGGTGT
8
6481 + + + + + + 6540
ACCACCTCCTGGAGCTGGGGCACCACGCGCTGAGGATGAGCAAGCCCCCGCTCAACCACA
-2 VEDLDPVVRDSYS FGGELVS-
CGCTGCCGCTCACGGTCACCACCATGCTCTGCTACGCCAACTCCTCCCTCCTCGCGCGCG
6541 + + + + + + 6600
GCGACGGCGAGTGCCAGTGGTGGTACGAGACGATGCGGTTGAGGAGGGAGGAGCGCGCGC
-2 LPLTVTTMLCYANS SLLARA-
CCGGTGTTCCGGAGTTGCCCCGTACCTGGGATGAGGTCGAAGCAGCCTGCCAGGCGGTGG
6601 + + + + + + 6660
GGCCACAAGGCCTCAACGGGGCATGGACCCTACTCCAGCTTCGTCGGACGGTCCGCCACC
-2 GVPELPRTWDEVEAACQAVA-
CCAGCGTCGACGGGGGGCCCGGTCACGGAATCACCTGGGCCAACGACGGCTGGGTTTTCC
6661 + + + + + + 6720
GGTCGCAGCTGCCCCCCGGGCCAGTGCCTTAGTGGACCCGGTTGCTGCCGACCCAAAAGG
-2 S VDGG PGHG I TWANDGWV F Q -
AGCAGGCCGTCGCCCTTCAGAACGGGGTGCTGACCGATCAGGACAACGGCCGCTCCGGCT
6721 + + + + + + 6780
TCGTCCGGCAGCGGGAAGTCTTGCCCCACGACTGGCTAGTCCTGTTGCCGGCGAGGCCGA
-2 QAVALQNGVLTDQDNGRS G S -
CCGCCACGACGGTGGACGTCACATCGGACGAGATGCTGGACTGGGTCCGCTGGTGGACGC
6781 + + + + + + 6840
GGCGGTGCTGCCACCTGCAGTGTAGCCTGCTCTACGACCTGACCCAGGCGACCACCTGCG
-2 ATTVDVTSDEMLDWVRWWTH-
ACCTCCATGAGCGCGGCCATTACCTCTACACGGGCGGGCCCTCGGACTGGGGCGGGGCGT
6841 + + + + + + 6900
TGGAGGTACTCGCGCCGGTAATGGAGATGTGCCCGCCCGGGAGCCTGACCCCGCCCCGCA
-2 LHERGHYLYTGG P SDWGGAF-
TCGAGGCTTTCGTCCAGCAGAAGGTCGCATTCACCTTCGACTCGTCCAAGGCCGCCCGGG
6901 + + + + + + 6960
AGCTCCGAAAGCAGGTCGTCTTCCAGCGTAAGTGGAAGCTGAGCAGGTTCCGGCGGGCCC
-2 EAFVQQKVAFTFDS S KAARE-
AACTCATCCAGGCCGGTGCACAGGCCGGTTTCGAGGTCGCGGTGTTCCCGTTGCCCAGGA
6961 + + + + + + 7020
TTGAGTAGGTCCGGCCACGTGTCCGGCCAAAGCTCCAGCGCCACAAGGGCAACGGGTCCT
-2 L I QAGAQAGFEVAVF PL PRN-
ACGCGAAGGCCCCGGTAGCGGGCCAGCCCGTCTCGGGAGACTCCCTGTGGCTGGCCGCGG
7021 + + + + + + 7080
TGCGCTTCCGGGGCCATCGCCCGGTCGGGCAGAGCCCTCTGAGGGACACCGACCGGCGCC
-2 AKAPVAGQPVSGD S LWLAAG-
GACTCGACGAGACCACGCAGGACGGGCTGCTCGCTCTCACCCAGTACCTGATCAGCCCGG
7081 + + + + + + 7140
CTGAGCTGCTCTGGTGCGTCCTGCCCGACGAGCGAGAGTGGGTCATGGACTAGTCGGGCC
-2 LDETTQDGLLALTQYLISPA-
CCAACGCCGCGGACTGGCACCGCACCAACGGTTTCGTACCGGTGACCGGCGCGGCCGGGG
7141 + + + + + + 7200
GGTTGCGGCGCCTGACCGTGGCGTGGTTGCCAAAGCATGGCCACTGGCCGCGCCGGCCCC
-2 NAADWHRTNG FVPVTGAAGE-
AACTGCTGGAAGCGACAGGCTGGTTCGACCGCCGGCCGCAGCAACGGGTGGCCGGGGAGC
7201 + + + + + + 7260
TTGACGACCTTCGCTGTCCGACCAAGCTGGCGGCCGGCGTCGTTGCCCACCGGCCCCTCG
-2 LLEATGWFDRRPQQRVAGEQ-
AGTTGAAGGCGTCCGACCGGTCACCGGCGGCGCTCGGCGCGCTGCTCGGCGACTTCGCGG
7261 + + + + + + 7320
9
TCAACTTCCGCAGGCTGGCCAGTGGCCGCCGCGAGCCGCGCGACGAGCCGCTGAAGCGCC
-2 LKASDRS PAALGALLGDFAA-
CCGTCAACGAGGTCATCACCGCAGCGATGGACGATGTCCTGCGCAGTGGAGCGGACCCCG
7321 + + + + + + 7380
GGCAGTTGCTCCAGTAGTGGCGTCGCTACCTGCTACAGGACGCGTCACCTCGCCTGGGGC
-2 VNEVI TAAMDDVLRSGAD P A -
CGAAGGCCTTCGCCGAAGCCGGCGTGGCCGCCCAGCAACTGCTCGATGCCTACAACGCCC
7381 + + + + + + 7440
GCTTCCGGAAGCGGCTTCGGCCGCACCGGCGGGTCGTTGACGAGCTACGGATGTTGCGGG
-2 KAFAEAGVAAQQLLDAYNAR-
GGAACCGCTCCGGATCCGGGACCCCCTCCGCCGTCTGAGATCCGGTACCGGGGCACAGGG
7441 + + + + + + 7500
CCTTGGCGAGGCCTAGGCCCTGGGGGAGGCGGCAGACTCTAGGCCATGGCCCCGTGTCCC
-2-* NRSGSGTPSAV* -
GCGCCGCCGCCCGCTTTCCCGGCGGGGCACTGGCCGGGGGACATGCTCTCCCGCCCCCGG
7501 + + + + + + 7560
CGCGGCGGCGGGCGAAAGGGCCGCCCCGTGACCGGCCCCCTGTACGAGAGGGCGGGGGCC
CAGGACGTAGGGTCAACCCGCCTGCGCCTTCAGGTGGCGGCGCAGATACTCACCGGTCAG
7561 + + + + + + 7620
GTCCTGCATCCCAGTTGGGCGGACGCGGAAGTCCACCGCCGCGTCTATGAGTGGCCAGTC
* GAQAKLHRRLYEGTL-
GGAGGAATCCGCGGCGAGCAGGTCCTTCGGTGTGCCGGTGAAGACGATCTCGCCGCCCTC
76 21 + + + + + + 7680
CCTCCTTAGGCGCCGCTCGTCCAGGAAGCCACACGGCCACTTCTGCTAGAGCGGCGGGAG
-1 SSDAALLDKPTGTFVI EGGE-
CCGTCCCCCGTCGGGACCCAGGTCGATGATCCAGTCGGCCTGCTGCACCACATCGAGGTT
7681 + + + + + + 7740
GGCAGGGGGCAGCCCTGGGTCCAGCTACTAGGTCAGCCGGACGACGTGGTGTAGCTCCAA
-1 RGGDPGLDI IWDAQQVVDLN-
GTGCTCGATGACCACGACGGTGTTCCCGGCCTCGACGAGCCCGTCCAGGAGCTTCAGCAG
7741 + + + + + + 7800
CACGAGCTACTGGTGCTGCCACAAGGGCCGGAGCTGCTCGGGCAGGTCCTCGAAGTCGTC
-1 HE I VVVTNGAEVLGDLLKLL-
GGTGTCAACGTCCGACATGTGCAGCCCGGTGGTGGGCTCGTCCAGGACATAGACCGTGCC
7801 + + + + + + 7860
CCACAGTTGCAGGCTGTACACGTCGGGCCACCACCCGAGCAGGTCCTGTATCTGGCACGG
-1 TDVDSMHLGTTPEDLVYVTG-
CGTGCGGTGCAGCTGGTCGGCAAGTTTGATCCGCTGCAGTTCACCGCCGGAGAGGCTGGA
7861 + + + + + + 7920
GCACGCCACGTCGACCAGCCGTTCAAACTAGGCGACGTCAAGTGGCGGCCTCTCCGACCT
-1 TRHLQDALKIRQLEGGSLSS-
AAGCGGCTGGCCCAGGCTGAGGTACCCAAGACCGACGTCGACGAGAGCGCGCAGTTTCGG
7921 + + + + + + 7980
TTCGCCGACCGGGTCCGACTCCATGGGTTCTGGCTGCAGCTGCTCTCGCGCGTCAAAGCC
-1 LPQGLS LYGLGVDVLARLKP-
CAGCAGGGCCTTCTCGGTGAAGAACTCGACGGCCTCGTCGGCGGGCAGCTCCAGGACGTC
7981 + + + + + + 8040
GTCGTCCCGGAAGAGCCACTTCTTGAGCTGCCGGAGCAGCCGCCCGTCGAGGTCCTGCAG
-1 LLAKETFFEVAEDAPLELVD-
CGCGATCGACTTCCCGCGAAGCTGGTGCTCCAGGACCTCGGGCTTGAAGCGGCGCCCCTC
8041 + + + + + + 8100
GCGCTAGCTGAAGGGCGCTTCGACCACGAGGTCCTGGAGCCCGAACTTCGCCGCGGGGAG
-1 AISKGRLQHELVEPKFRRGE-
10
ACAGACACCGCAGTGCGTGGTCACCGGATCCATGAAGGCCAGCTCGGTGATGATGACCCC
8101 + + + + + + 8160
TGTCTGTGGCGTCACGCACCAGTGGCCTAGGTACTTCCGGTCGAGCCACTACTACTGGGG
-1 CVGCHTTVPDMFALET I I V G -
GCGGCCCTGGCACTCCTCGCACGACCCCTTGGAGTTGAAGCTGAACAGCGAGGCGTTCGC
8161 + + + + + + 8220
CGCCGGGACCGTGAGGAGCGTGCTGGGGAACCTCAACTTCGACTTGTCGCTCCGCAAGCG
-1 RGQCEECSGKSNFS FLSANA-
GCCGGTCTCCTTCGCGAACAGCTTGCGCAGCGGGTCCATCAGGCCGAGGTAGGAGACCGG
8221 + + + + + + 8280
CGGCCAGAGGAAGCGCTTGTCGAACGCGTCGCCCAGGTAGTCCGGCTCCATCCTCTGGCC
-1 GTEKAFLKRLPDMLGLYSVP-
TGTGGAGCGCGACGAGGCGGCGATCGCGGACTGGTCGACAAAGACCGCGTCGGGGTGCGC
8281 + + + + + + 8340
ACACCTCGCGCTGCTCCGCCGCTAGCGCCTGACCAGCTGTTTCTGGCGCAGCCCCACGCG
-1 TSRS SAAIASQDVFVADPHA-
CTCCATGAATGCCCCGGAGATCAGGCTGCTCTTGCCGGAACCCGCCACCCCGGTCACCGC
8341 + + + + + + 8400
GAGGTACTTACGGGGCCTCTAGTCCGACGAGAACGGCCTTGGGCGGTGGGGCCAGTGGCG
-1 EMFAGS ILSSKGSGAVGTVA-
GGTCAGCACACCGGTGGGCACGGCCACGGAGACCTGCTTCAGGTTGTGGAGATCCGCGTT
8401 + + + + + + 8460
CCAGTCGTGTGGCCACCCGTGCCGGTGCCTCTGGACGAAGTCCAACACCTCTAGGCGCAA
-1 TLVGT PVAVSVQKLNHLDAN-
CTCCACGGTCAGCTCCCCCGTGGGCGGGCGGACCTCCTCCTTCACGCGGGCCCCCCGCCG
8461 + + + + + + 8520
GAGGTGCCAGTCGAGGGGGCACCCGCCCGCCTGGAGGAGGAAGTGCGCCCGGGGGGCGGC
-1 EVTLEGTPPRVEEKVRAGRR-
CAGAGCCTCCCCGGTCCGGGTCTTCGCCTTCCGCAGCTTCGCGAAGGACCCCTCGAACAC
8521 + + + + + + 8580
GTCTCGGAGGGGCCAGGCCCAGAAGCGGAAGGCGTCGAAGCGCTTCCTGGGGAGCTTGTG
-1 LAEGTRTKAKRL KAF S GE FV-
GATCTCGCCCCCGTGCACTCCCGCCCCGGGACCGACATCGACGATGTGGTCGGCGATCTC
8581 + + + + + + 8640
CTAGAGCGGGGGCACGTGAGGGCGGGGCCCTGGCTGTAGCTGCTACACCAGCCGCTAGAG
-1 I EGGHVGAGPGVDVI HDAI E-
GATCACAtcGGGGTCGTGCTCGACGACCAGCACGGTGTTCCCCTTGTCGCGCAGCGCGCG
8641 + + + + + + 8700
CTAGTGTagCCCCAGCACGAGCTGCTGGTCGTGCCACAAGGGGAACAGCGCGTCGCGCGC
-1 I VD PDHEVVLVTNGKDRLAR-
CAGCAGGTCGTTGAGCCGCCCCACGTCGCGCGGGTGCAGGCCGATGCTGGGCTCGTCGAA
8701 + + + + + + 8760
GTCGTCCAGCAACTCGGCGGGGTGCAGCGCGCCCACGTCCGGCTACGACCCGAGCAGCTT
-1 LLDNLRGVDRPHLG I S PEDF-
GATGTACGTGAGCCCGGCCAGACCACTGCCGAGGTGGCGCACCATCTTCAGCCGCTGCCC
8761 + + + + + + 8820
CTACATGCACTCGGGCCGGTCTGGTGACGGCTCCACCGCGTGGTAGAAGTCGGCGACGGG
-1 I YTLGALGS GLHRVMKLRQG-
CTCGCCCCCCGAGAGGTCGGCCGTGGGCCTGTCCAGGGTCAGGTAGCCGAGCCCGATGGA
8821 + + + + + + 8880
GAGCGGGGGGCTCTCCAGCCGGCACCCGGACAGGTCCCAGTCCATCGGCTCGGGCTACCT
-1 EGGSLDATPRDLTLYGLGIS-
11
CACGATCCGCTCCAGGGCCGTGCGCGCGGCTTTCGCGAGAGGGGCAGCGGCCGGCTCCGT
8881 + + + + + + 8940
GTGCTAGGCGAGGTCCCGGCACGCGCGCCGAAAGCGCTCTCCCCGTCGCCGGCCGAGGCA
-1 VI RELATRAAKALPAAAPET-
GACGCCGGCGAGCACCTCCGTGAGGTCGCGGACCTCCATGCTCGAGTAGTCGGCGATGTT
8941 + + + + + + 9000
CTGCGGCCGCTCGTGGAGGCACTCCAGCGCCTGGAGGTACGAGCTCATCAGCCGCTACAA
-1 VGALVETLDRVEMS SYDAIN-
CTTGCCGTCGATCCGGACGTCGAGCGCGGCGGCGTTGAGCCGCGCGCCCCGGCAGGAGGG
9001 + + + + + + 9060
GAACGGCAGCTAGGCCTGCAGCTCGCGCCGCCGCAACTCGGCGCGCGGGGCCGTCCTCCC
-1 KGD I RVDLAAANLRAGRC S P -
ACAGACTCCGTCgGTGACGAAACGTTCGATGACCTCGCGCTTGCGGTCGcTCAGCGCGCT
9061 + + + + + + 9120
TGTCTGAGGCAGcCACTGCTTTGCAAGCTACTGGAGCGCGAACGCCAGCgAGTCGCGCGA
-1 CVGDTVFRE IVERKRDSLAS-
GAGGTCGCGCTTGAGGTTGAgCCGCTCGAACCGGTCGGCcAACCCCTCGTAGTTCGTCTG
9121 + + + + + + 9180
CTCCAGCGCGAACTCCAACTcGGCGAGCTTGGCCAGCCGgTTGGGGAGCATCAAGCAGAC
-1 LDRKLNLRE FRDALGEYNTQ-
GAACTCGGTGCTCTTGGTCTTCAGCGTcACCTTCCCGCCGGTGcCGCGCAGCAGCGTGTC
9181 + + + + + + 9240
CTTGAGCCACGAGAACCAGAAGTCGCAgTGGAAGGGCGGCCACgGCGCGTCGTCGCACAG
-1 FETS KTKLTVKGGTGRLLTD-
CAGCTCCTCGGCGCTGTACTCGGCGATCGGCTTGGCCGGATCCAGACGGCCGGACTTCGC
9241 + + + + + + 9300
GTCGAGGAGCCGCGACATGAGCCGCTAGCCGAACCGGCCTAGGTCTGCCGGCCTGAAGCG
-1 LEEASYEAI PKAPDLRGSKA-
CCAGATCTGCCAGTCCGGGCTACCCACCTTGTACTCGGGGAAAAGGACCGCCCCGTCGTC
9301 + + + + + + 9360
GGTCTAGACGGTCAGGCCCGATGGGTGGAACATGAGCCCCTTTTCCTGGCGGGGCAGCAG
-1 W I QWD PSGVKYE P FLVAGDD-
CAGGGACTTCGAGCGGTCCAGCATCTTGTCCAGGTCGAGGGCGATGCTCTGGCCGAGACC
9361 + + + + + + 9420
GTCCCTGAAGCTCGCCAGGTCGTAGAACAGGTCCAGCTCCCGCTACGAGACCGGCTCTGG
-1 LS KSRDLMKDLDLAI SQGLG-
GTCGCAGTCCGGGCACATGCCCTGGGGGTCGTTGAACGAGAACGCGGAGACGCCGAGCGA
9421 + + + + + + 9480
CAGCGTCAGGCCCGTGTACGGGACCCCCAGCAACTTGCTCTTGCGCCTCTGCGGCTCGCT
-1 DCDPCMGQPDNFS FASVGLS-
GGACGGCCCGTCGTCCTTCGTCGTGCCGAACCGTGCGAACAGGGCCCGGATCATCGGCTG
9481 + + + + + + 9540
CCTGCCGGGCAGCAGGAAGCAGCACGGCTTGGCACGCTTGTCCCGGGCCTAGTAGCCGAC
-1 SPGDDKTTGFRAFLARIMPQ-
TACGTCCGTCATGGTCCCCACCGTGGACCGGGCGTTGCCCCCCACGGGCTTCTGGTCGAC
9541 + + + + + + 9600
ATGCAGGCAGTACCAGGGGTGGCACCTGGCCCGCAACGGGGGGTGCCCGAAGACCAGCTG
-1 VDTMTGVTS RANGGVP KQDV-
GATCACCGGGGTGGTGAGGTTCTCGATCGCCTCGGCCTGAGGACGTTCGTACTTCGGAAG
9601 + + + + + + 9660
CTAGTGGCCCCACCACTCCAAGAGCTAGCGGAGCCGGACTCCTGCAAGCATGAAGCCTTC
-1 IVPTTLNE IAEAQPREYKPL-
CTGGTTGCGGATGTACCAGCTGAAGGTGGAGTTCAGCTGTCGCTGGGCCTCCACGGCCAC
12
9661 + + + + + + 9720
GACCAACGCCTACATGGTCGACTTCCACCTCAAGTCGACAGCGACCCGGAGGTGCCGGTG
-1 QNR I YWS FTSNLQRQAEVAV-
CGTGTCGAAGACGATCGACGACTTGCCCGAACCCGAGACCCCCGTGAAGACCGTGATCTG
9721 + + + + + + 9780
GCACAGCTTCTGCTAGCTGCTGAACGGGCTTGGGCTCTGGGGGCACTTCTGGCACTAGAC
-1 TDFVI SSKGSGSVGTFVTIQ-
GTTGCGGGGAATCGTCAGGGAGACATCTTTGAGGTTGTGGATCCGCGCGCCCGCGATGCG
9781 + + + + + + 9840
CAACGCCCCTTAGCAGTCCCTCTGTAGAAACTCCAACACCTAGGCGCGCGGGCGCTACGC
-1 NRPITLSVDKLNHI RAGAIR-
GATGCCGTCTCCCGGGCCGGATGTTTTTCCCGCGCCGGCGGTGGGGTCGGTGACGCTCAC
984! + + + + + + 9900
CTACGGCAGAGGGCCCGGCCTACAAAAAGGGCGCGGCCGCCACCCCAGCCACTGCGAGTG
-l-< IGDGPGSTKGAGATPDTVSN-
AGAGTTTTCCTCCTGGCTTCCGTACATGATTTACCGTGTCAGCCGGGCAAACCGGCGGAA
9901 + + + + + + 9960
TCTCAAAAGGAGGACCGAAGGCATGTACTAAATGGCACAGTCGGCCCGTTTGGCCGCCTT
CGGTAACCACCTAGCTTGTACTCAGGAGGTGTCCGGGGTCTTCTCCTCCCGTGCTGACTT
9961 + + + + + + 10020
GCCATTGGTGGATCGAACATGAGTCCTCCACAGGCCCCAGAAGAGGAGGGCACGACTGAA
0-* * STDPTKEERASK-
GGGGGCCGGCCCGCCGGACAGGGCCGGCTCCGTGTTCCACCCCGCCAGCCGATCCCCCCG
10021 + + + + + + 10080
CCCCCGGCCGGGCGGCCTGTCCCGGCCGAGGCACAAGGTGGGGCGGTCGGCTAGGGGGGC
0 PAPGGSLAPETNWGALRDGR-
CTCCGTCTCGTCCTCCTCGAGAACGATCCGGCTGCTCGCCCAGCGCAGGATCGGCGGCGC
10081 + + + + + + 10140
GAGGCAGAGCAGGAGGAGCTCTTGCTAGGCCGACGAGCGGGTCGCGTCCTAGCCGCCGCG
0 ETEDEELVIRSSAWRLIPPA-
CGTCACCGAGGTGATGAGGGCGACCAGCACGATGATCGTGAAGGTCACGGTGTCCAGTAC
10141 + + + + + + 10200
GCAGTGGCTCCACTACTCCCGCTGGTCGTGCTACTAGCACTTCCAGTGCCACAGGTCATG
0 TVST I LAVLVI ITFTVTDLV-
GCCGATACGCAGGCCGACCAGGGCGATCACCACCTCGATCATTCCACGCGAGTTCATCCC
10201 + + + + + + 10260
CGGCTATGCGTCCGGCTGGTCCCGCTAGTGGTGGAGCTAGTAAGGTGCGCTCAAGTAGGG
0 G I RLGVLAIVVE IMGRSNMG-
CGCTCCGAGCGCCAGCCCCTCGTAGCGGCTCATCCCGCCACTACGGGCGGCGACGTACGC
10261 + + + + + + 10320
GCGAGGCTCGCGGTCGGGGAGCATCGCCGAGTAGGGCGGTGATGCCCGCCGCTGCATGCG
0 AGLALGEYRSMGGSRAAVYA-
ACCGGCGAACTTGCCGAAAGTGGCCACCAACAGCACCCCGAGGCCCGTGAGCAGCACCGA
10321 + + + + + + 10380
TGGCCGCTTGAACGGCTTTCACCGGTGGTTGTCGTGGGGCTCCGGGCACTCGTCGTGGCT
0 GAFKGFTAVLLVGLGTLLVS-
CGGCTCCGCGAGTGCGGTCAGGTCCATGCGAAGCCCCACACTGCCCAGGAACACCGGTGC
10381 + + + + + + 10440
GCCGAGGCGCTCACGCCAGTCCAGGTACGCTTCGGGGTGTGACGGGTCCTTGTGGCCACG
0 PEALATLDMRLGVSGLFVPA-
GAACACGGCCATGACCAGCGTGCGCAGCGGGGCGAGCCGTACCGGGGCGATGTGCCTCAG
10441 + + + + + + 10500
CTTGTGCCGGTACTGGTCGCACGCGTCGCCCCGCTCGGCATGGCCCCGCTACACGGAGTC
13
0 FVAMVLTRL PALRVPAI HRL-
CAGGGTCGCACCGGCCACGAACGCCCCGAACAACGCCTCCATCCCGGCCGCCGCGGTCT^G
10501 + + + + + + 10560
GTCCCAGCGTGGCCGGTGCTTGCGGGGCTTGTTGCGGAGGTAGGGCCGGCGGCGCCAGTC
0 LTAGAVFAGFLAEMGAAATL-
CGCCCCGTACAGGACGACCACGGCCACGCCGACGGTGACGGCCGATACGGGGACCCGGCT
10561 + + + + + + 10620
GCGGGGCATGTCCTGCTGGTGCCGGTGCGGCTGCCACTGCCGGCTATGCCCCTGGGCCGA
0 AGYLVVVAVGVTVASVPVRS-
GTCACCCGTACGGGACAGCCGCCTGCCGATCGGGCCGCCCACCGCACACGCCGCGGCGAC
10621 + + + + + + 10680
CAGTGGGCATGCCCTGTCGGCGGACGGCTAGCCCGGCGGGTGGCGTGTGCGGCGCCGCTG
0 DGTRSLRRGI PGGVACAAAV-
GAAGACGGTCGTCCAGGCCATCGTGGTCAGGACCACGGGCCCCCCGGCCGCCCCACTCGC
10681 + + + + + + 10740
CTTCTGCCAGCAGGTCCGGTAGCACCAGTCCTGGTGCCCGGGGGGCCGGCGGGGTGAGCG
0 FVTTWAMTTLVVPGGAAGSA-
CAGCGCCGTCACCAGAGCGAGCAGCAGCCAGCCCACCGCGTCGTCGAACACCGCTGCCGC
10741 + + + + + + 10800
GTCGCGGCAGTGGTCTCGCTCGTCGTCGGTCGGGTGGCGCAGCAGCTTGTGGCGACGGCG
0 LATVLALLLWGVADD FVAAA-
GATGAGCAGCTGGCCGACGTTGCGGTGCGTCAGATTCAGGTCGGCGAGCGTCTTGGCGAT
10801 + + + + + + 10860
CTACTCGTCGACCGGCTGCAACGCCACGCAGTCTAAGTCCAGCCGCTCGCAGAACCGCTA
0 ILLQGVNRHTLNLDALTKAI-
CACCGGGAGGGCCGTGACACACATCGCGACCCCGAGGAACAGCGCGAAGACGCCCCGCTC
10861 + + + + + + 10920
GTGGCCCTCCCGGCACTGTGTGTAGCGCTGGGGCTCCTTGTCGCGCTTCTGCGGGGCGAG
0 VPLATVCMAVGL FLAFVGRE-
TCCGGAGTCCGCGAGCAGCGAGGCGGGCACCAGGTAGCCGGTGGCGATGCCCAGCCCCAG
10921 + + + + + + 10980
AGGCCTCAGGCGCTCGTCGCTCCGCCCGTGGTCCATCGGCCACCGCTACGGGTCGGGGTC
0 GSDALLSAPVLYGTAIGLGL-
AGGAATCAGAAGACCCGCCAGGCTGACCCGGGCGGCCAGACCCCCGCGCTTGCGCAGGAT
10981 + + + + + + 11040
TCCTTAGTCTTCTGGGCGGTCCGACTGGGCCCGCCGGTCTGGGGGCGCGAACGCGTCCTA
0 P I LLGALSVRAALGGRKRL I-
CCGGGGGTCGAACTGGGCACCTGCGATGGCCACCAGCAGAAGGACGCCGAACTGGCAGAA
11041 + + + + + + 11100
GGCCCCCAGCTTGACCCGTGGACGCTACCGGTGGTCGTCTTCCTGCGGCTTGACCGTCTT
0 RPDFQAGAIAVLLLVGFQCF-
CGCGTCGAGCAGGTGCGCCTGCGAGATGTCCTCGGGAAACAGCCTGCCGGAAAGTCCCGG
11101 + + + + + + 111^0
GCGCAGCTCGTCCACGCGGACGCTCTACAGGAGCCCTTTGTCGGACGGCCTTTCAGGGCC
0 ADLLHAQS IDEPFLRGSLGP-
CGAGATCTGCCCCAGCAGGGTCGGCCCGAGCAGTACCCCCGCGGTCAGCTCCCCCACCAG
11161 + + + + + + H220
GCTCTAGACGGGGTCGTCCCAGCCGGGCTCGTCATGGGGGCGCCAGTCGAGGGGGTGGTC
0 S I QGLLTPGLLVGATLEGVL-
CGGCGGCAGACCGATCCGGGTCCCCAGCCGTCCCAGACCGTAGGCACAGGCGAGCAGGAG
11221 + + + + + + 11280
GCCGCCGTCTGGCTAGGCCCAGGGGTCGGCAGGGTCTGGCATCCGTGTCCGCTCGTCCTC
0 PPLGIRTGLRGLGYACALLL-
14
GCCGACCTGGAGCAGGAAGACCGTCAGCGGCTCCCCGCCCAGCGGCGACGTGGCTGCGAG
11281 + + + + + + H340
CGGCTGGACCTCGTCCTTCTGGCAGTCGCCGAGGGGCGGGTCGCCGCTGCACCGACGCTC
0 GVQLLFVTLPEGGLPSTAAL-
CACAGCCACGTCAGGACCGCGCACCGGGAACCCAGCCCAGCCCGTCCGTCGACGCGGCCA
11341 + + + + + + 11400
GTGTCGGTGCAGTCCTGGCGCGTGGCCCTTGGGTCGGGTCGGGCAGGCAGCTGCGCCGGT
0-< V A V -
11-* * SRAGPVWGLGDTSAA
GACCCCCCTGCCTCACCGGTCGCTCGGCCCCCGCCTCATCCCCCAGAAGAGCCCGTGCCT
11401 + + + + + + H460
CTGGGGGGACGGAGTGGCCAGCGAGCCGGGGGCGGAGTAGGGGGTCTTCTCGGGCACGGA
11 LGGQRVPREAGAEDGLLARA
GCAGTGCGGCGCTCTGCTCCATGAGGCGGCCCACCACCTTTCCCGGCACGGCGCCGTGCG
11461 + + + + + + 11520
CGTCACGCCGCGAGACGAGGTACTCCGCCGGGTGGTGGAAAGGGCCGTGCCGCGGCACGC
11 QLAASQEMLRGVVKGPVAGH
GCCCGTCGGCGTCGCCCGCAGCGGTGTGCGTCATGCCGGCCATCTCGTCGGACGCCTCGG
11521 + + + + + + H580
CGGGCAGCCGCAGCGGGCGTCGCCACACGCAGTACGGCCGGTAGAGCAGCCTGCGGAGCC
11 PGDADGAATHTMGAMED SAE
AGAACCGCTGCCTGGCCCGGGCCGTGTCGGCGAACTCGTCGGAGGAGACCCCGCCGATCA
11581 + + + + + + 11640
TCTTGGCGACGGACCGGGCCCGGCACAGCCGCTTGAGCAGCCTCCTCTGGGGCGGCTAGT
11 SFRQRARATDAFEDSSVGGI
GTTCGACGAAGGACTGCAGGTCGGAGTCCGCGGTGTTGGAGATCTTCCGGGCCTGCCAGA
11641 + + + + + + II 700
CAAGCTGCTTCCTGACGTCCAGCCTCAGGCGCCACAACCTCTAGAAGGCCCGGACGGTCT
11 LEVFSQLDSDATNS IKRAQW
AATAGGAGTCCTCCGAATGGTGCATGTCGTAGAAGCCGACCAGGAACTCGTAGAAGCGGC
11701 + + + + + + 11760
TTATCCTCAGGAGGCTTACCACGTACAGCATCTTCGGCTGGTCCTTGAGCATCTTCGCCG
11 FYSDESHHMDYFGVLFEYFR
CGTACTCCAGCCGGTAGCGGGCCTCGAACTCCTCGAACGCGCTGGTCTCGTCGACCGACC
11761 + + + + + + 11820
GCATGAGGTCGGCCATCGCCCGGAGCTTGAGGAGCTTGCGCGACCAGAGCAGCTGGCTGG
11 GYELRYRAEFEEFASTEDVS
CGTCCAGGCAGGAGTTGAGCGAGCGCGCTGCCAGCAGTCCGCTGTAGGTGGCGAGGTGCA
11821 + + + + + + H880
GCAGGTCCGTCCTCAACTCGCTCGCGCGACGGTCGTCAGGCGACATCCACCGCTCCACGT
11 GDLCSNLSRAALLGSYTALH
CCCCGGAGGAGAACACCGGGTCGACGAAGCACGCGGCATCCCCGACCAGGGCCATGCCCG
11881 + + + + + + H940
GGGGCCTCCTCTTGTGGCCCAGCTGCTTCGTGCGCCGTAGGGGCTGGTCCCGGTACGGGC
11 VGSSFVPDVFCAADGVLAMG
GCGCCCAGAACTTCGTGTTGCTGTACGACCAGTCCTTGCGGACCCGGAGCTCGCCGTAGG
H941 + + + + + + 12000
CGCGGGTCTTGAAGCACAACGACATGCTGGTCAGGAACGCCTGGGCCTCGAGCGGCATCC
11 PAWFKTNSYSWDKRVRLEGY
GGCCCTCGGTCACCCGGGTGGCCTCGGAGAGCTTCTCCGCGATCAGCGGGCAGGCCGCGA
12001 + + + + + + 12060
CCGGGAGCCAGTGGGCCCACCGGAGCCTCTCGAAGAGGCGCTAGTCGCCCGTCCGGCGCT
11 PGETVRTAESLKEAI LPCAA
15
TGAACGACTCCATCGCCTTCTCGGGGTCGCCCTGCACCAGGCTCGCCGAGTCCCGGTTCA
12061 + + + + + + 12120
11 I FSEMAKEPDGQVLSASDRN
CCACTGCGCCGACACTCGTCAGCTCGGGAGACAGGGGTATGTACCAGAACCACCCGTGCT
12121 + + + + + + 12180
GGTGACGCGGCTGTGAGCAGTCGAGCCCTCTGTCCCCATACATGGTCTTGGTGGGCACGA
11 VVAGVSTLEPSLPIYWFWGH
CGAAGGTGCAGGTGAAGATGTTCCCGGAGTTCGGCTTCGGAAGCCGCTTGCCGCCGTTGA
12181 + + + + + + 12240
GCTTCCACGTCCACTTCTACAAGGGCCTCAAGCCGAAGCCTTCGGCGAACGGCGGCAACT
11 EFTCTFINGSNPKPLRKGGN
AGTAGCCGAACAGGGCCAGGTTGCGGAAGAAGGGCGAGTACTCGCGCTTGGCGCCCGACT
12241 + + + + + + 12300
TCATCGGCTTGTCCCGGTCCAACGCCTTCTTCCCGCTCATGAGCGCGAACCGCGGGCTGA
11 FYGFLALNRFFPSYERKAGS
TCTTGTACAGCCCACCGGTGTTGCCGGAGGCGTCCACGACGAAACGGGAGCCCACCTCGT
12301 + + + + + + 12360
AGAACATGTCGGGTGGCCACAACGGCCTCCGCAGGTGCTGCTTTGCCCTCGGGTGGAGCA
11 KKYLGGTNGSADVVFRS GVE
GCTCGCGCCCCTCGGAGTCCCGGTAGCGCACGCCCCGCACCCGGCCGTCCTCGGCCTTGA
12 361 + + + + + + 12420
CGAGCGCGGGGAGCCTCAGGGCCATCGCGTGCGGGGCGTGGGCCGGCAGGAGCCGGAACT
11 HERGESDRYRVGRVRGDEAK
GCACGTCGAGGACATCGCTGTTCTCCCGCACCTCGACACCGTGCCTGCGAGCGTTGTCGA
12421 + + + + + + 12480
CGTGCAGCTCCTGTAGCGACAAGAGGGCGTGGAGCTGTGGCACGGACGCTCGCAACAGCT
11 LVDLVDSNERVEVGHRRAND
GCAGGATCTGGTCGAACTTCATGCGCTCGACCTGGTACGCGTACCCCGTCGCCCCCGGCA
12481 + + + + + + 12540
CGTCCTAGACCAGCTTGAAGTACGCGAGCTGGACCATGCGCATGGGGCAGCGGGGGCCGT
11 LLIQDFKMREVQYAYGTAGP
TCCGGCGCGAGACGGCGAAGTCGAACGTCCACGGTTCGGGGTTGGCACCCCACTTGAACG
12541 + + + + + + !2600
AGGCCGCGCTCTGCCGCTTCAGCTTGCAGGTGCCAAGCCCCAACCGTGGGGTGAACTTGC
11 MRRSVAFDFTWPEPNAGWKF
TCCCGCCGTGCTTGATCGTGAAGGCTGCCTTCTTCAGCTCGTCGGAGACACCGAGGAGGT
12601 + + + + + + 12660
AGGGCGGCACGAACTAGCACTTCCGACGGAAGAAGTCGAGCAGCCTCTGTGGCTCCTCCA
11 TGGHKITFAAKKLEDSVGLL
GTGCGATGCCGTGGACGGTGGAGGGGAGGAGCGACTCACCGATCTGGTAGCGCGGGAAGG
12 661 + + + + + + 12720
CACGCTACGGCACCTGCCACCTCCCCTCCTCGCTGAGTGGCTAGACCATCGCGCCCTTCC
11 HAIGHVTSPLLSEGIQYRPF
TCTCCTTCTCCAGCTGGAGTACGCGATGGCCCCGCTTGCGGACCAGCGTGGAGACGGTCG
12721 + + + + + + 12780
AGAGGAAGAGGTCGACCTCATGCGCTACCGGGGCGAACGCCTGGTCGCACCTCTGCCAGC
11 TEKELQLVRHGRKRVLTSVT
AGCCCGCCGGACCTCCGCCGACCACGATGACGTCGTACTGCGCTGACACGTCCACGGACT
12781 + + + + + + 12840
TCGGGCGGCCTGGAGGCGGCTGGTGCTACTGCAGCATGACGCGACTGTGCAGGTGCCTGA
ll-< SGAPGGGVVIVDYQASVDM
CTCCTTCTCGCACATCGGGCGTCTCATATTCCCAGGAATCCTCTGGCCCGCCCAGGTGCT
16
12841 + + + + + + 12900
GAGGAAGAGCGTGTAGCCCGCAGAGTATAAGGGTCCTTAGGAGACCGGGCGGGTCCACGA
GCCGCATCTTCGGTATTGCGAAGTCGTGGGCATTCTGCGAGAAGCATGAACCGCGTGGCC
12901 + + + + + + 12960
CGGCGTAGAAGCCATAACGCTTCAGCACCCGTAAGACGCTCTTCGTACTTGGCGCACCGG
CGGTCTACAGTGGCGTGGAATTTCAGTGATTGCGCTGAAGGGCGGCACACGATGAAGGCA
12961 + + + + + + 13020
GCCAGATGTCACCGCACCTTAAAGTCACTAACGCGACTTCCCGCCGTGTGCTACTTCCGT
10-> M K A
CTTGTACTGTCGGGTGGTTCGGGGACCCGCCTGCGCCCGATCAGTTACGCCATGCCGAAG
13021 + + + + + + 13080
GAACATGACAGCCCACCAAGCCCCTGGGCGGACGCGGGCTAGTCAATGCGGTACGGCTTC
10 LVLSGGSGTRLRPI SYAMPK
CAGCTCGTTCCGATCGCCGGGAAGCCAGTCCTTGAATATGTTCTGGATAATATCCGGAAC
13081 + + + + + + 13140
GTCGAGCAAGGCTAGCGGCCCTTCGGTCAGGAACTTATACAAGACCTATTATAGGCCTTG
10 QLVPIAGKPVLEYVLDNIRN
CTCGATATCAAAGAGGTCGCCATTGTCGTCGGTGACTGGGCTCAGGAAATTATTGAGGCA
13141 + + + + + + 13200
GAGCTATAGTTTCTCCAGCGGTAACAGCAGCCACTGACCCGAGTCCTTTAATAACTCCGT
10 LDIKEVAIVVGDWAQEI IEA
ATGGGTGACGGCAGCCGTTTCGGTCTGCGCCTCACCTACATACGCCAGGAGCAACCTCTG
13201 + + + + + + 13260
TACCCACTGCCGTCGGCAAAGC CAGACGCGGAGTGGATGTATGCGGT CCTCGTTGGAGAC
10 MGDGSRFGLRLTYIRQEQPL
GGCATCGCGCACTGCGTGAAACTGGCCCGAGACTTCCTCGACGAGGACGACTTCGTCCTC
13261 + + + + + + 13320
CCGTAGCGCGTGACGCACTTTGACCGGGGTCTGAAGGAGCTGCTCCTGCTGAAGCAGGAG
10 GIAHCVKLARDFLDEDDFVL
TACCTAGGCGACATCATGCTGGACGGAGACCTGTCCGCGCAGGCGGGGCACTTCCTCCAC
13321 + + + + + + 13380
ATGGATCCGCTGTAGTACGACCTGCCTCTGGACAGGCGCGTCCGCCCCGTGAAGGAGGTG
10 YLGDIMLDGDLSAQAGHFLH
ACCCGCCCCGCCGCGCGGATCGTCGTGCGCCAGGTGCCCGACCCCCGGGCCTTCGGGGTG
13381 + + + + - "-- + + 13440
TGGGCGGGGCGGCGCGCCTAGCAGCACGCGGTCCACGGGCTGGGGGCCCGGAAGCCCCAC
10 TRPAARIVVRQVPDPRAFGV
ATCGAGCTGGACGGCGAAGGGCGTGTGCTGCGCCTGGTCGAGAAACCCCGTGAACCGCGC
13441 + + + + + + 13500
TAGCTCGACCTGCCGCTTCCCGCACACGACGCGGACCAGCTCTTTGGGGCACTTGGCGCG
10 IELDGEGRVLRLVEKPREPR
AGCGACCTCGCGGCGGTCGGCGTGTACTTCTTCACCGCGGACGTGCACCGCGCCGTCGAC
13501 + + + + + + 13560
TCGCTGGAGCGCCGCCAGCCGCACATGAAGAAGTGGCGCCTGCACGTGGCGCGGCAGCTG
10 SDLAAVGVYFFTADVHRAVD
GCGATTAGCCCGAGCCGACGGGGCGAGCTGGAAATCACCGACGCCATCCAGTGGCTGCTG
13561 + + + + + + 13620
CGCTAATCGGGCTCGGCTGCCCCGCTCGACCTTTAGTGGCTGCGGTAGGTCACCGACGAC
10 AISPSRRGELEITDAIQWLL
GAGCAGGGCCTGCCGGTCGAGGCCGGCCGCTACACGGACTACTGGAAGGACACCGGCCGG
13621 + + + + + + 13680
CTCGTCCCGGACGGCCAGCTCCGGCCGGCGATGTGCCTGATGACCTTCCTGTGGCCGGCC
10 EQGLPVEAGRYTDYWKDTGR
17
GTCGAGGACGTCGTGGAGTGCAACCGGCGGATGCTCGGCCGTCTGGCGCTCCAGGTGTCG
13681 + + + + + + 13740
CAGCTCCTGCAGCACCTCACGTTGGCCGCCTACGAGCCGGCAGACCGCGAGGTCCACAGC
VEDVVECNRRMLGRLALQVS
GGCGAGGTGGACCCGGAGAGCGAACTGGTGGGTGCGGTGGTCGTCGAGGAGGGCGCCCGG
13741 + + + + + + 13800
CCGCTCCACCTGGGCCTCTCGCTTGACCACCCACGCCACCAGCAGCTCCTCCCGCGGGCC
GEVDPES ELVGAVVVEEGAR
GTGACGCGTTCGCGGGTCGTGGGACCAGCGGTGATCGGCGCGGGCACGGTCGTCGAGGAC
13801 + + + + + + 13860
CACTGCGCAAGCGCCCAGCACCCTGGTCGCCACTAGCCGCGCCCGTGCCAGCAGCTCCTG
VTRSRVVGPAVI GAGTVVED
AGCCAGATCGGACCGTACGCCTCCATCGGCCGGCGCTGCACCGTGCGGGCGTCCCGGCTC
13861 + + + + + + 13920
TCGGTCTAGCCTGGCATGCGGAGGTAGCCGGCCGCGACGTGGCACGCCCGCAGGGCCGAG
SQIGPYAS IGRRCTVRASRL
TCCGACTCCATCGTCCTTGACGACGCCTCGATCCTCGCGGTGAGCGGACTGCACGGCTCG
13921 + + + + + + 13980
AGGCTGAGGTAGCAGGAACTGCTGCGGAGCTAGGAGCGCCACTCGCCTGACGTGCCGAGC
SDS IVLDDAS ILAVSGLHGS
CTGATCGGAAGGGGCGCGCGGATCGCGCCCGGGGCCCGGGGCGAGGCCCGGCACCGGCTG
13981 + + + + + + 14040
GACTAGCCTTCCCCGCGCGCCTAGCGCGGGCCCCGGGCCCCGCTCCGGGCCGTGGCCGAC
L IGRGARIAPGARGEARHRL
GTCGTCGGCGACCACGTGCAGATCGAGATCGCGGCCTGACGCACCCACCGGAGCACCGGG
14041 + + + + + + 14100
CAGCAGCCGCTGGTGCACGTCTAGCTCTAGCGCCGGACTGCGTGGGTGGCCTCGTGGCCC
-* VVGDHVQ I E IAA*-
GGGAGGCTCGGCAGGGGCGTCAGGCCGTAAGAAGGGCTGCCGGGGCGGGACGGACCCGCC
14101 + + + + + + 14160
CCCTCCGAGCCGTCCCCGCAGTCCGGCATTCTTCCCGACGGCCCCGCCCTGCCTGGGCGG
CCGGCAGCCCACAGGTCCCCGGTCCGCGGATATGGGGGACTCGAGGTTCGATCAGCCGAA
14161 + + + + + + 14220
GGCCGTCGGGTGTCCAGGGGCCAGGCGCCTATACCCCCTGAGCTCCAAGCTAGTCGGCTT
* * G F -
GGTCAGAGC CACGTGGCCGAGGTCGAGCC CGGAGTTGCCGGCGCCGAGGTTACAGGCGGC
14221 + + + + + + 14280
CCAGTCTCGGTGCACCGGCTCCAGCTCGGGCCTCAACGGCCGCGGCTCCAATGTCCGCCG
TLAVHGLDLGSNGAGLNCAA-
CGTGGCGCAGTCGACGCTGCCGACCGGCGTGCCTTCGGGCGTGGAGCCCGTGTACGACTT
14281 + + + + + + 14340
GCACCGCGTCAGCTGCGACGGCTGGCCGCACGGAAGCCCGCACCTCGGGCACATGCTGAA
TACDVSGVPTGEPTSGTYSK-
GCGCACGACGAAGCTGAACGACGCCGCTCCGGACGCGTCCGTGGTGAAGGACGTCGCGGT
14341 + + + + + + 14400
CGCGTGCTGCTTCGACTTGCTGCGGCGAGGCCTGCGCAGGCACCACTTCCTGCAGCGCCA
RVVFSFSAAGSADTTFSTAT-
CGCCGGGTTGCACGCGTCCTGGCCACCGACCGGAGCGCACTGGGCGATGTAGTAGGTCTC
14401 + + + + + + 14460
GCGGCCCAACGTGCGCAGGACCGGTGGCTGGCCTCGCGTGACCCGCTACATCATCCAGAG
APNCADQGGVPACQAI YYTE-
GCCGGCGGCGGCACCGCTGACCGACACCGACACGCTCTGTCCGTCACTCAGACCCGAGGC
18
14461 + + + + + + 14520
CGGCCGCCGCCGTGGCGACTGGCTGTGGCTGTGCGAGACAGGCAGTGAGTCTGGGCTCCG
9 GAAAGSVSVSVSQGDSLGSA-
GGGACTGACGGAGAAGGCGGGCGCGGCGAAGGCGACGGACTGTGCGGCGGCGGCCAGGCC
14521 + + + + + + 14580
CCCTGACTGCCTCTTCCGCCCGCGCCGCTTCCGCTGCCTGACACGCCGCCGCCGGTCCGG
9 P S V S FAPAAFAVS QAAAAL G -
GATGGATGCGACGGCCACGACGCCGAACCTGGAAGCACGGCGGGACATGTGACGTAACGA
14581 + + + + + + 14640
CTACCTACGCTGCCGGTGCTGCGGCTTGGACCTTCGTGCCGCCCTGTACACTGCATTGCT
9 I SAVAVVGFRSARRSMHRLS-
CATGCGTAGGCTCCGATTCGAGGAGGGGGTTGATCACTCCATGAAAGGATCACCTCGCCG
14641 + + + + + + I 4700
GTACGCATCCGAGGCTAAGCTCCTCCCCCAACTAGTGAGGTACTTTCCTAGTGGAGCGGC
9-< M -
8-* * R A
GACGGCCGCCTGCATCTCCCTCTGTGCTCTCGTGGATTTCCGGCACGGCACTCCCGTCGA
14701 + + + + + + 14760
CTGCCGGCGGACGTAGAGGGAGACACGAGAGCACCTAAAGGCCGTGCCGTGAGGGCAGCT
8 PRGGADGETSEHIEPVASGD
CGGCCGCCCGCAGAATGCGGCAGACCCCCCGCACCTCCTCCGGCCCCACCGCCGTACCGG
14761 + + + + + + 14820
GCCGGCGGGCGTCTTACGCCGTCTGGGGGGCGTGGAGGAGGCCGGGGTGGCGGCATGGCC
8 VAARLIRCVGRVEEPGVATG
TGGGCAGCGACAGCACCCGCTCGGTGAGCGCCTCCACCTTCGGGAGCGGATCGGGCGCGT
14821 + + + + + + 14880
ACCCGTCGCTGTCGTGGGCGAGCCACTCGCGGAGGTGGAAGCCCTCGCCTAGCCCGCGCA
8 TPLSLVRETLAEVKPLPDPA
GGCGCGCGAGGTCGGACCGGTAGGGCTCGCAGCTGTGGCAGCCGGGGCTGAAGTAGGCGC
14881 + + + + + + 14940
CCGCGCGCTCCAGCCTGGCCATCCCGAGCGTCGACACCGTCGGCCCCGACTTCATCCGCG
8 HRALDSRYPECSHCGPS FYA
GGGCCAGGACGTTGTGCCGTTGGAGCACCGCCTGGAGTTCGTCGCGGTGCAGCCCGGCGC
X4941 + + + + + + 15000
CCCGGTCCTGCAACACGGCAACCTCGTGGCGGACCTCAAGCAGCGCCACGTCGGGCCGCG
8 RALVNHRQLVAQLEDRHLGA
GGACGGCGTCCACCTCGATGACGACGTACTGGCAGTTCGACAGCTCGTTCGGATCCTGCG
15001 + + + + + + 15060
CCTGCCGCAGGTGGAGCTACTGCTGCATGACCGTCAAGCTGTCGAGCAAGCCTAGGACGC
8 RVADVE IVVYQCNSLENPDQ
GGCGGACCCGGACGCCGGGCAGTCCGTCGAGGTACTGCTCGTACAGACGGTAGTTGCGCC
15061 + + + + + + 15120
CCGCCTGGGCCTGCGGCCCGTCAGGCAGCTCCATGACGAGCATGTCTGCCATCAACGCGG
8 PRVRVGPLGDLYQEYLRYNR
GGTTGATCGCGGTGAAGTGATCGGCGGACTCCAGGGAGGTGAGGCCCATGGCCGCGCTGA
15121 + + + + + + 15180
CCAACTAGCGCCACTTCACTAGCCGCCTGAGGTCCCTCCACTCCGGGTACCGGCGCGACT
8 RNIATFHDASELSTLGMAAS
TCTCGTGCATCCGCGCGACCGTTCCGCTCCCGGTGATCTCATGCGCGGCGTTGAGCCCCT
15181 + + + + + + 15240
AGAGCACGTAGGCGCGCTGGCAAGGCGAGGGCCACTAGAGTACGCGCCGCAACTCGGGGA
8 IEHMRAVTGSGTIEHAANLG
GGTGGCGCATGGCCCGGAGCCGGTCGGCCAGGGCGTCGTCGTCGGTGACGATCGCCCCGC
19
15241 + + + + + ^ -j j u w
CCACCGCGTACCGGGCCTCGGCCAGCCGGTCCCGCAGCAGCAGCCACTGCTAGCGGGGCG
QHRMARLRDALADDDTVIAG
CCTCGAAGCTGTTCACGAACTTCGTCGCCTGGAAGCTGAAGATCTCCGCCGTGCCGAAGC
1530 l + + + + + + 15360
GGAGCTTCGACAAGTGCTTGAAGCAGCGGACCTTCGACTTCTAGAGGCGGCACGGCTTCG
GEFSNVFKTAQFSFIEATGF
CGCCGATCGGCTTCGACCGGTAGGTGCAGCCGAAGGCGTGGGCGGCATCGAAGAGCAGGT
15361 + + + + + + 15420
GCGGCTAGCCGAAGCTGGCCATCCACGTCGGCTTCCGCACCCGCCGTAGCTTCTCGTCCA
GGI PKSRYTCGFAHAADFLL
GCAGCCCGTGCTCGGCGGCCAGCTTGGTCAGCTCGTCGATCCGGGCCGGTCTGCCGAAGA
15421 + + + + + + 15480
CGTCGGGCACGAGCCGCCGGTCGAACCAGTCGAGCAGCTAGGCCCGGCCAGACGGCTTCT
HLGHEAALKTLEDIRAPRGF
CGTGCACGTCCAGGATGGCGCGGGTACGCGGGCCGATGAGCCGCTCCACGTGTGCCACGT
15481 + + + + + + 15540
GCACGTGCAGGTCCTACCGCGCCCATGCGCCCGGCTACTCGGCGAGGTGCACACGGTGCA
VHVD L IARTR PG I LREVHAV
CCGCGGTTCCGGTCTCCTCGTCCAGTTCGCAGAAGACAGGCACCGCACCGATCCAGTCCA
15541 + + + + + + 15600
GGCGCCAAGGCCAGAGGAGCAGGTCAAGCGTCTTCTGTCCGTGGCGTGGCTAGGTCAGGT
DATGTEEDLECFVPVAGIWD
GTGCGTGGGCGGTGGCGACCCAGGTGAAGGAGGGCACGATCACCTCGTCCCCAGGACCGA
15601 + + + + + + 15660
CACGCACCCGCCACCGCTGGGTCCACTTCCTCCCGTGCTAGTGGAGCAGGGGTCCTGGCT
LAHATAVWTFS PVIVEDGPG
TGCCCAGGGCCTTCGCGGCGACCTGGATGCCGGTGGTGGCGTTCGATACGGCGACGCAGT
15661 + + + + + + I 5720
ACGGGTCCCGGAAGCGCCGCTGGACCTACGGCCACCACCGCAAGCTATGCCGCTGCGTCA
IGLAKAAVQIGTTANSVAVC
GCCTGACCTGGGTCAGCTCGGCCACACGGGCCTCGAACTCCCGGACCAGGGGGCCGTCAT
15721 + + + + + + 15 780
CGGACTGGACCCAGTCGAGCCGGTGTGCCCGGAGCTTGAGGGCCTGGTCCCCCGGCAGTA
HRVQTLEAVRAEFERVLPGD
TGGTGAACCACAGGCGCTCCAGCGCCCCGTCGATCCGTTCCATCAAACGGTCGCGGGAGC
15781 + + + + + + 15840
ACCACTTGGTGTCCGCGAGGTCGCGGGGCAGCTAGGCAAGGTAGTTTGCCAGCGCCCTCG
NTFWLRELAGDIREMLRDRS
CCACGTTCGGGCGTCCCACGTGCAGCGGTTCGCTGAAGTAGGGCGTGGGTAGGGAGTCCA
15841 + + + + + + 15900
GGTGCAAGCCCGCAGGGTGCACGTCGCCAAGCGACTTCATCCCGCACCCATCCCTCAGGT
GVNPRGVHLPESFYPTPLSD
GACGCACCGGGCCGCCGCTCATGCCGTGCGCACGCCGACGAAGAGGCCGGGGCTGTTGGG
15901 + + + + + + 15 960
CTGCGTGGCCCGGCGGCGAGTACGGCACGCGTGCGGCTGCTTCTCCGGCCCCGACAACCC
DAPGRRSCRAHADEEAGAVG
THRAAAHAVRTPTKRPGLLG -
RTGPPLMPCARRRRGRGCWA-
15901 + + + + + + 15960
* *ATRVGVFLGPSNP-
< LRVPGGSM
CCGGCCGTCGGCCAGCCGGAAGCCGGGCACGAACCGCACCGAGAGCCCCACCGATTCGAA
15961 + + + + + + 16020
20
GGCCGGCAGCCGGTCGGCCTTCGGCCCGTGCTTGGCGTGGCTCTCGGGGTGGCTAAGCTT
7 RGDALRFGPVFRVS LGVSEF-
GGCGTCGGTGTACTGCTCGCGGGTGAAGAGGCTGGAGGTCAGGACCTCGGAGAACTCTCT
16021 + + + + + + 16080
CCGCAGCCACATGACGAGCGCCCACTTCTCCGACCTCCAGTCCTGGAGCCTCTTGAGAGA
7 ADTYQERTFLS STLVESFER-
GAAGCCGGAGGCGTCCGCGACCCGGAACCGGACCTCCAGACGTGACTTGTCGCCCTGGCG
16081 + + + + + + 16140
CTTCGGCCTCCGCAGGCGCTGGGCCTTGGCCTGGAGGTCTGCACTGAACAGCGGGACCGC
7 FGSADAVRFRVELRS KDGQR-
CACGGAGTGCGTCATCCGCGTGATGACACGGCCCTCCTCCTGGTGCAGATGGCCGCCGAC
16141 + + + + + + 16200
GTGCCTCACGCAGTAGGCGCACTACTGTGCCGGGAGGAGGACCACGTCTACCGGCGGCTG
7 VSHTMRTIVRGEEQHLHGGV-
ATGC CCGTCGAGGAAGTTCTCGGGGAAATACCAGGGTTCGGCGACGAGGACTC CCC CGGG
16201 + + + + + + 16260
TACGGGCAGCTCCTTCAAGAGCCCCTTTATGGTCCCAAGCCGCTGCTCCTGAGGGGGCCC
7 HGDLFNEPFYWPEAVLVGGP-
GTTCAGGTGGTGGGCCATGGCCGACACCGCGGCCTTGAGCTCGGTGACGGACCCCATCTC
16261 + + + + + + 16320
CAAGTCCACCACCCGGTACCGGCTGTGGCGCCGGAACTCGAGCCACTGCCTGGGGTAGAG
7 NLHHAMAS VAAKL ETVS GME-
GCCGAGCGCGTTGCCCATGCAGGTGATCGCGTCGAAGGTGCGGCCCAGGTCGAACGAACG
16321 + + + + + + 16380
CGGCTCGCGCAACGGGTACGTCCACTAGCGCAGCTTCCACGCCGGGTCCAGCTTGCTTGC
7 GLANGMCT IADFT RGLDFS R -
CATGTCACCGGCGTGCAGCGGGACGCCGGGAAGCCGGCCCGCCGCCTGCTCCAGCATCGC
16381 + + + + + + 16440
GTACAGTGGCCGCACGTCGCCCTGCGGCCCTTCGGCCGGGCGGCGGACGAGGTCGTAGCG
7 MDGAHL PVGPLRGAAQELMA-
GGGCGCGTACTCGAGGCCCTCCACATGGCCGAAGAGCGTGGCGAGCGTCTCCAGATGGGC
16441 + + + + + + 16500
CCCGCGCATGAGCTCCGGGAGGTGTACCGGCTTCTCGCACCGCTCGCAGAGGTCTACCCG
7 PAYELGEVHGFLTALTELHA-
TCCGGTGCCGCAGGCGACGTCCAGGAGCGACACGGCGTCGGGGCGGGCGGCGAGGATCAG
16501 + + + + + + 16560
AGGCCACGGCGTCCGCTGCAGGTCCTCGCTGTGCCGCAGCCCCGCCCGCCGCTCCTAGTC
7 GTGCAVDLLSVAD PRAAL I L-
CTCGGTGAGCCCGCGGGCCTCCAGGTCGAAGTCCTTGCCGCGGCTGCGGAACACGAGGTC
16561 + + + + + + 16620
GAGCCACTCGGGCGCCCGGAGGTCCAGCTTCAGGAACGGCGCCGACGCCTTGTGCTCCAG
7 ETLGRAELDFDKGRSRFVLD-
GTAGAACTTCGCGTGCTCGGGGCCGTACTCCATCAGACGAGCTCCTTCGCAGACTGGGCG
16621 + + + + + + 16680
CATCTTGAAGCGCACGAGCCCCGGCATGAGGTAGTCTGCTCGAGGAAGCGTCTGACCCGC
7-< YFKAHEPGYEM-
6-* *VLEKASQA-
GAGATGATTCTGGGCTCCGGGATGGGAACGATGAACTTCCCTCCCGCCTCCAGGAAGCGG
16681 + + + + + + 16740
CTCTACTAAGACCCGAGGCCCTACCCTTGCTACTTGAAGGGAGGGCGGAGGTCCTTCGCC
6 SIIRPEPIPVIFKGGAELFR-
CGCTCCTTGCGGACGACCTCGTCGGTGTAGTTCCAGGCGAGGAGGAGGTAGTAGTCCGGC
16741 + + + + + + 16800
21
GCGAGGAACGCCTGCTGGAGCAGCCACATCAAGGTCCGCTCCTCCTCCATCATCAGGCCG
6 REKRVVEDTYNWALLLYYDP-
TCGGTGGCAGCGACCTCCTCCGGAGGAAGGACCGGGATGCGGTTCCCCGGCAGCAGTTTG
16801 + + + + + + 16860
AGCCACCGTCGCTGGAGGAGGCCTCCTTCCTGGCCCTACGCCAAGGGGCCGTCGTCAAAC
6 ETAAVEEPPLVPIRNGPLLK-
CCGTGCTTGAGGCTGGTGGTGTCGCCGCAGACGGTGATGTCCTGATCCGTCAGACCGCAG
16861 + + + + + + 16920
GGCACGAACTCCGACCACCACAGCGGCGTCTGCCACTACAGGACTAGGCAGTCTGGCGTC
6 GHKLSTTDGCVTIDQDTLGC
GCCATCAGCAACTGGGTCCCCTTGGACGGTGCTCCGTAGCCGGCCACGCGGTGGCCGTCC
16921 + + + + + + 16980
CGGTAGTCGTTGACCCAGGGGAACCTGCCACGAGGCATCGGCCGGTGCGCCACCGGCAGG
6 AMLLQTGKS PAGYGAVRHGD
GCGGCCAGACCGCGAACGAGCGTACGGATCGCTTCGGTCACGCGCGTCACCCGCTCGGCG
16981 + + + + + + 17040
CGCCGGTCTGGCGCTTGCTCGCATGCCTAGCGAAGCCAGTGCGCGCAGTGGGCGAGCCGC
6 AALGRVLTRIAETVRTVREA-
AACGCCCGGTAGGGGGCATCCGTCAGCAGTCCGCGCTCCTCCTCCAGGCCGAGCAGCGCC
17041 + + + + + + 17100
TTGCGGGCCATCCCCCGTAGGCAGTCGTCAGGCGCGAGGAGGAGGTCCGGCTCGTCGCGG
6 FARYPADTLLGREEELGLLA-
GCGACCGAGGGCTCCGGGACCCGTGCGGCCGACTCGCGCGCGGCGACGACCGCGATCGAA
17101 + + + + + + 17160
CGCTGGCTCCCGAGGCCCTGGGCACGCCGGCTGAGCGCGCGCCGCTGCTGGCGCTAGCTT
6 AVSPEPVRAASERAAVVAIS-
CCGCCGTGCACGGCGACCCGCTCCACGTCGATGATCCGCAGGCCGTGCGCGCCGAAGAGG
17161 + + + + + + !7220
GGCGGCACGTGCCGCTGGGCGAGGTGCAGCTACTAGGCGTCCGGCACGCGCGGCTTCTCC
6 GGHVAVREVDI IRLGHAGFL-
TGGCGCAGTGTGTGCAGGGAGAAGTACGACAGGTGCTCGTGGTAGATCGTGTCGAACTGG
17221 + + + + + + 17280
ACCGCGTCACACACGTCCCTCTTCATGCTGTCCACGAGCACCATCTAGCACAGCTTGACC
6 HRLTHLSFYSLHEHYITDFQ-
TTCTCGTCGAGCAGGTTCAGCAGGTACGGCACCTCGATGACCAGGACGCCGTCGTCGTCG
17281 + + + + + + 17340
AAGAGCAGCTCGTCCAAGTCGTCCATGCCGTGGAGCTACTGGTCCTGCGGCAGCAGCAGC
6 NEDLLNLLYPVEIVLVGDDD
AGCACTGCGTCGACGCCGTCCAGGATGCGGTGCACGTCGTCGATGTGCGCGAAGCACTGG
17341 + + + + + + 17400
TCGTGACGCAGCTGCGGCAGGTCCTACGCCACGTGCAGCAGCTACACGCGCTTCGTGACC
6 LVADVGDL I RHVDD I HAFCQ-
CGGCCGATGACGGCCTTGGCCCTGCCCTGCTCAAGGGCGATGCGGCCCGCGGGCTCCGGG
17401 + + + + + + 17460
GCCGGCTACTGCCGGAACCGGGACGGGACGAGTTCCCGCTACGCCGGGCGCCCGAGGCCC
6 RGIVAKARGQELAI RGAPE P
CCGAAGAAGTCCGGGTCCGTGGGGATCCCCCGGGCGTTGGCGATCTCGGCGAGGTTGGCC
17461 + + + + + + 17520
GGCTTCTTCAGGCCCAGGCACCCCTAGGGGGCCCGCAACCGCTAGAGCCGCTCCAACCGG
6 GFFDPDTP IGRANAIEALNA-
GCCGGGTCGACCCCGGCCACCCGCATGCCCGCCGCCCGGAACATCGCGAGCTGGGTGCCG
175 2l + + + + + + 17580
CGGCCCAGCTGGGGCCGGTGGGCGTACGGGCGGCGGGCCTTGTAGCGCTCGACCCACGGC
22
6 APDVGAVRMGAARFMALQTG-
ACGTTGCTGCCCAGCTCCACGACCAGGTCGCCGGAGGCGAGGCTTGCCCGGCGGGTCGCC
1758 1 + + + + + + 17640
TGCAACGACGGGTCGAGGTGCTGGTCCAGCGGCCTCCGCTCCGAACGGGCCGCCCAGCGG
6 VNSGLEVVLDGSALSARRTA-
AGCCCGACGATGTGCGCCATGTGCTCGCGGATCTGGTCGGAGTCGGAGGAGACGTAGACG
17641 + + + + + + 17700
TCGGGCTGCTACACGCGGTACACGAGCGCCTAGACCAGCCTCAGCCTCCTCTGCATCTGC
6 LGVIHAMHERIQDSDSSVYV-
TAGTGCTTGAACAGTGTCCCGGGGTCGACGACATGGCGAAGCGTCATCAGCCGGCACGAC
17701 + + + + + + 17760
ATCACGAACTTGTCACAGGGCCCCAGCTGCTGTACCGCTTCGCAGTAGTCGGCCGTGCTG
6 YHKFLTGPDVVHRLTMLRCS-
CGGCACACGATGACGTCGAGCGGGAAGACGTCCTGCGCCTCATCGGCGTCGGCCGGATCG
1776 1 + + + + + + 17820
GCCGTGTGCTACTGCAGCTCGCCCTTCTGCAGGACGCGGAGTAGCCGCAGCCGGCCTAGC
6 RCVIVDLPFVDQAEDADAPD
ACGAACCCGTTGGCCAGCGGCAGCGAGCCGAAGGAGATCACCTCGGTCCAGTCGTCCGCA
17821 -f + + + + + 17880
TGCTTGGGCAACCGGTCGCCGTCGCTCGGCTTCCTCTAGTGGAGCCAGGTCAGCAGGCGT
6 VFGNALPLSGFS IVETWDDA-
CCGCATACACGGCACGTCTCGTCCCGCCTGCATTTCTCCAGCATGAAGTCTCCTGACGGC
17881 + + + + + + 17940
GGCGTATGTGCCGTGCAGAGCAGGGCGGACGTAAAGAGGTCGTACTTCAGAGGACTGCCG
6-< GCVRCTEDRRCKELM-
GAATGCCGACGCATCGGGCCCGTCGGTCCGGGGACGGTCAATCTAGGGTTCCGGCCGACG
17 941 + + + + + + 18000
CTTACGGCTGCGTAGCCCGGGCAGCCAGGCCCCTGCCAGTTAGATCCCAAGGCCGGCTGC
GGCGCTCCACTTCGTATGTGCCCTACTGGTTCAGCGGAGCGGACGGGTGAACGCCCGTAC
18001 + + + + + + 18060
CCGCGAGGTGAAGCATACACGGGATGACCAAGTCGCCTCGCCTGCCCACTTGCGGGCATG
17_* * RL PRT FARV-
GTCCTCGATGAGGAGCTGCGGCTGCTCCATGGCCGCGAAGTGCCCGCCGCGGTCGAACTC
180 61 + + + + + + 18120
CAGGAGCTACTCCTCGACGCCGACGAGGTACCGGCGCTTCACGGGCGGCGCCAGCTTGAG
17 DEILLQPQEMAAFHGGRDFE-
GGTCCACCGCGTCAGGGTCGGCAGGATGCCCTCGGCGAACGACCGGATCGGCCGGGTGGC
18121 + + + + + + 18180
CCAGGTGGCGCAGTCCCAGCCGTCCTACGGGAGCCGCTTGCTGGCCTAGCCGGCCCACCG
17 TWRTLTPLIGEAFSRI PRTA-
GTCGTCCGGGAACACCGCGACGCCGACGGGGGCCGTCAGCGGCCAGGGCCCGCCCCAGGT
1818i + + + + + + 18240
CAGCAGGCCCTTGTGGCGCTGCGGCTGCCCCCGGCAGTCGCCGGTCCCGGGCGGGGTCCA
17 DDPFVAVGVPATLPWPGGWT-
GCGGGCGAAGTCCGCCATGCCGCGAGCCGACTCGTAGTACAACTGAGCGCTGGAACCGGC
18241 + + + + + + 18300
CGCCCGCTTCAGGCGGTACGGCGCTCGGCTGAGCATCATGTTGACTCGCGACCTTGGCCG
17 RAFDAMGRAS EYYLQAS SGA-
CGTCGCGGTCAGCCAGTAGATCATCACGTGGGTGAGCAGCCGGTCCCGGGAGATGGCCTC
18301 + + + + + + 18360
GCAGCGCCAGTCGGTCATCTAGTAGTGCACCCACTCGTCGGCCAGGGCCCTCTACCGGAG
17 TATLWYIMVHTLLRDRS I A E -
23
CTCCACGTTCTTGCCGCCGCTCCACTCCTGGAACTTGTCGAGAATCCAGGCGAGCTGGCC
18361 + + + + + + i0iiZU
GAGGTGCAAGAACGGCGGCGAGGTGAGGACCTTGAACAGCTCTTAGGTCCGCTCGACCGG
17 EVNKGGSWEQFKDL IWALQG-
GACCGGGGAGTCGGTGAGGCCGTAGGCCAGGGTCTGCGGGCGGGTGGCCTGGATGCGCTG
18421 + + + + + + 18480
CTGGCCCCTCAGCCACTCCGGCATCCGGTCCCAGACGCCCGCCCACCGGACCTACGCGAC
17 VPSDTLGYALTQPRTAQ IRQ-
CCAGCCGATGCCGGTGTCGGCGAACTCCCCGCTGTGCGCCAGCTTGCCCAGGTCGCTCTC
18481 + + + + + + 18540
GGTCGGCTACGGCCACAGCCGCTTGAGGGGCGACACGCGGTCGAACGGGTCCAGCGAGAG
17 WGIGTDAFEGSHALKGLDSE-
GTCCAGGCGCCCGATGGCCTCCGGGGCGTCCTGGGGCGGGAAGGTCACCAGCATGTTCAG
18541 + + + + + + 18600
CAGGTCCGCGGGCTACCGGAGGCCCCGCAGGACCCCGCCCTTCCAGTGGTCGTACAAGTC
17 DLRGIAEPADQPPFTVLMNL-
GTGGACGCCGGCCACGTGCTCGGGGTCGGCCAGCCCCAGCTCCAGCGAGACGACCTTTCC
18601 4- 4- + + + + 18660
CACCTGCGGCCGGTGCACGAGCCCCAGCCGGTCGGGGTCGAGGTCGCTCTGCTGGAAAGG
17 HVGAVHE PDALGLELSVVKG-
CCAGTCGCCGCCCTGGGCGACGTAACGCTCGTAGCCGAGGCGGTTCATCAGCTCCGCCCA
1 866 1 + + + + + + 18720
GGTCAGCGGCGGGACCCGCTGCATTGCGAGCATCGGCTCCGCCAAGTAGTCGAGGCGGGT
17 WDGGQAVYREYGLRNMLEAW-
GGCGCGTGCGATCCGCCGCACGTCCCAGCCCGGCTCGGCAGTCGGGCCGGAGAAGCCGTA
18721 + + + + + + 18780
CCGCGCACGCTAGGCGGCGTGCAGGGTCGGGCCGAGCCGTCAGCCCGGCCTCTTCGGCAT
17 ARAI RRVDWGPEATPGS FGY-
GCCCGGCATGGAGGGGACGACGACGTGGAAGGCGTCCGCCGGGTCGCCGCCGTGCGCGCG
1 8781 + + + + + + 18840
CGGGCCGTACCTCCCCTGCTGCTGCACCTTCCGCAGGCGGCCCAGCGGCGGCACGCGCGC
17 GPMS PVVVHFADAPDGGHAR-
CGGGTCGCTCAGCGGCCCGATGACGTCGAGGAACTCGGCGACCGAGCCCGGCCAGCCGTG
18841 + + + + + + 18900
GCCCAGCGAGTCGCCGGGCTACTGCAGCTCCTTGAGCCGCTGGCTCGGGCCGGTCGGCAC
17 PDSLPGIVDLFEAVSGPWGH-
GGTGAGGATCAGCGGGATCGCGTCCGGCTCGGGCGAACGCACGTGAAGGAAGTGCACGTC
18901 + + + + + + 18960
CCACTCCTAGTCGCCCTAGCGCAGGCCGAGCCCGCTTGCGTGCACTTCCTTCACGTGCAG
17 TLILPIADPEPSRVHLFHVD-
GGCGCCGTCGATCGTGGTGACGAACTGGGGGAACGCGTTCAGCTCGGCCTCCGCGGCACG
1896 i + + + + + + 19020
CCGCGGCAGCTAGCACCACTGCTTGACCCCCTTGCGCAAGTCGAGCCGGAGGCGCCGTGC
17 AGD I TTVFQPFANLEAEAAR-
CCAGTCGTAGCCGTGGCGCCAGTGGTCGGTGAGCTCCTTGAGGTAGGACAGCGGCACTCC
19021 + + + + + + 19080
GGTCAGCATCGGCACCGCGGTCACCAGCCACTCGAGGAACTCCATCCTGTCGCCGTGAGG
17 WDYGHRWHDTLEKLY S L PVG-
GCGGTCCCATCCGGATCCGGGTATCTCGGACGGCCACCGGGTCGCGTCGATCCGCCGGGT
19081 + + + + + + 19140
CGQCAGGGTAGGCCTAGGCCCATAGAGCCTGCCGGTGGCCCAGCGCAGCTAGGCGGCCCA
17 RDWGSGPI ESPWRTADIRRT-
TAAGGTCGTCGAATGTCGGACTGGGTCGATCTCGATACGGAAGGGACGCACAGTGAATCC
24
19141 + + + + + + 19200
ATTCCAGCAGCTTACAGCCTGACCCAGCTAGAGCTATGCCTTCCCTGCGTGTCACTTAGG
17-< LTTSHRVPD I E I RFPRM-
ACCCTCGTGATTGTGGGAGCGGGGCGGCGCGAGGCGGCCGCCCCGATGTGATCCGGGGAC
19201 + + + + + + 19260
TGGGAGCACTAACACCCTCGCCCCGCCGCGCTCCGCCGGCGGGGCTACACTAGGCCCCTG
CGTGTCTCAGGCCGGTTCGGCCGGCGCGGCCGCGCCTTCCCGTGCGGAGAAGGACCGCAG
19261 + + + + + + 19320
GCACAGAGTCCGGCCAAGCCGGCCGCGCCGGCGCGGAAGGGCACGCCTCTTCCTGGCGTG
16-* *AP EAPAAAGERAS FSRV-
GGAGGACAGGAAGTTGCGGATCATCGGCATGCCGTGTTCGGTCCGGAAGCTCTCCGGATG
19321 + + + + + + 19380
CCTCCTGTCCTTCAACGCCTAGTAGCCGTACGGCACAAGCCAGGCCTTCGAGAGGCCTAC
16 S SLFNRIMPMGHETRFSEPH-
GAACTGGACGGACTCCACCGGCAGCGAACGGTGGCGCAGGCCCATCACGTACCCGTCGTC
19381 + + + + + + 19440
CTTGACCTGCCTGAGGTGGCCGTCGCTTGCCACCGCGTCCGGGTAGTGCATGGGCAGCAG
16 F QVS EVPLS RHRL GMVYGDD-
CGTGGAGCGCCCGGTGACCTCGAGGGACGGCGGGACCGTGCCCTCCGGCACGATCAGTGA
19441 + + + + + + 19500
GCACCTCGCGGGCCACTGGAGCTCCCTGCCGCCCTGGCACGGGAGGCCGTGCTAGTCACT
16 TSRGTVELSPPVTGEPVILS-
GTGGTAGCGGGTCGCGAAGAACCCCGCGGGCAGCCCGGTGAACACTCCGCGCCCGTCGTG
19501 + + + + + + 19560
CACCATCGCCCAGCGCTTCTTGGGGCGCCCGTCGGGCCACTTGTGAGGCGCGGGCAGCAC
16 HYRTAFFGAPLGTFVGRGDH-
CGTGATCCGGCTCGTCTTCCCGTGCATGAGATGCCGGGCGGGGACGGTGGCGGCGCCGTA
19561 + + + + + + 19620
GCACTAGGCCGAGCAGAAGGGCACGTACTCTACGGCCCGCCCCTGCCACCGCCGCGGCAT
16 TIRSTKGHMLHRAPVTAAGY-
GGCGCGGGCGACGGCCTGATGCCCCAGACAGACCCCGAGCAGCGGGACCCGGCCGGCGAA
19621 + + + + + + 19680
CCGCGCCCGCTGCCGGACTACGGGGTCTGTCTGGGGCTCGTCGCCCTGGGCCGGCCGCTT
16 ARAVAQHGIiCVGLL PVRGAF-
GGCCTGGACGATCTCGACGTGCCCGGAGGTGTCGGGGTGGCCGGGGCCCGGCCCCAGCAG
19681 + + + + + + 19740
CCGGACCTGCTAGAGCTGCACGGGCCTCCACAGCCCCACCGGCCCCGGGCCGGGGTCGTC
16 AQVI EVHGSTDPHGPGPGLL-
GACCGCGTCCGGCCGCATCAGCCCCATCTCGTCCGGGGTCATGAGATGCGACCGCACCAT
19741 + + + + + + 19800
CTGGCGCAGGCCGGCGTAGTCGGGGTAGAGCAGGCCCCAGTACTCTACGCTGGCGTGGTA
16 VAD PRMLGMED P TMLH S RVM-
GACGGGCTCCGCGCCGGCGGACATCAGATACTGGCGCAGGATGTCGACGAAGCTGTCGAA
19801 + + + + + + 19860
CTGCCCGAGGCGCGGCCGCCTGTAGTCTATGACCGCGTCCTACAGCTGCTTCGACAGCTT
16 VPEAGASMLYQRLIDVFSDF-
CGCGTCGACCACCAGGACCCGCGGGGCCTCGGTGCCTGCGCCGGATCCGTCGGGAGACCA
19861 + + + + + + 19920
GCGCAGCTGGTGGTCCTGGGCGCCCCGGAGCCACGGACGCGGCCTAGGCAGCCCTCTGGT
16 ADVVLVRPAETGAGSGDPSW-
CAAGCTCACAGCAACTCCTCTCCGGTGACCGCCCAGTGAGTGGCGCTCATCTTGGCCAGC
19921 + + + + + + 19980
GTTCGAGTGTCGTTGAGGAGAGGCCACTGGCGGGTCACTCACCGCGAGTAGAACCGGTCG
25
16-< L S M -
15-* * LLEEGTVAWHTASMKAL
GTCTCGGTCCACTCCGCCCCCGGTTCGGAATCGGCGACGATTCCGGCCGAGGCCCGGGTG
19981 + + + + + + 20040
CAGAGCCAGGTGAGGCGGGGGCCAAGCCTTAGCCGCTGCTAAGGCCGGCTCCGGGCCCAC
15 TETWEAGPESDAVIGASART-
CGGTAGACGCCCTCGTGGTGGAAAAGGGTCCGGATGCACAGCGCGAGGTTGGTGTACCCG
20041 + + + + + + 20100
GCCATCTGCGGGAGCACCACCTTTTCCCAGGCCTACGTGTCGCGCTCCAACCACATGGGC
15 RYVGEHHFLTRI CLALNTYG
CCCACGTCGAGGAGGCCGAGCGCCCCGGCGTACAGGCCGCGGCGGCTGCGTTCGACGGAC
20101 + + + + + + 20160
GGGTGCAGCTCCTCCGGCTCGCGGGGCCGCATGTCCGGCGCCGCCGACGCAAGCTGCCTG
15 GVDLLGLAGAYLGRRSREVS-
TCGATGATCTCCATGGCGCGGATCTTCGGCGCGCCCGTCATGGTGCCGGCGGGGAACAGG
20161 + + + + + + 20220
AGCTACTAGAGGTACCGCGCCTAGAAGCCGCGCGGGCAGTACCACGGCCGCCCCTTGTCC
15 EI IEMARI KPAGTMTGAPFL
GCGGCGATGGTGTCGAAGGCATCGGTGTCCACCCGCGCCCGGCCGACGACCGTGGAGACC
20221 + + + + + + 20280
CGCCGCTACCACAGCTTCCGTAGCCACAGGTGGGCGCGGGCCGGCTGCTGGCACCTCTGG
15 AAITDFADTDVRARGVVTSV-
AGGTGCAGCACGTGGGAGTAGCCCTCCACGTCCAGCTGGTCGGGTACGTCGAGCGTGTTC
20281 + + + + + + 20340
TCCACGTCGTGCACCCTCATCGGGAGGTGCAGGTCGACCAGCCCATGCAGCTCGCACAAG
15 LHLVHSYGEVDLQDPVDLTN
GGCCGGGCGATCCGTCCGATGTCGTTGCGGCAGAGGTCCACCAGCATGGTGTGCTCGGCG
20341 + + + + + + 20400
CCGGCCCGCTAGGCAGGCTACAGCAACGCCGTCTCCAGGTGGTCGTACCACACGAGCCGC
15 PRAIRGIDNRCLDVLMTHEA-
ATCTCCTTGGGATCCGACCTCAGCCGGACTCCCGCGGCGATGCCGCCGTCCGCGCCGGAC
20401 + + + + + + 20460
TAGAGGAAGCCTAGGCTGGAGTCGGCCTGAGGGCGCCGCTACGGCGGCAGGCGCGGCCTG
15 IEKPDSRLRVGAAIGGDAGS
CGCGGCACCGTGCCCGCGATCGGCCGCATCGTGACCTCGCCGTCCTCGATGCGTACGAAC
20461 + + + + + + 20520
GCGCCGTGGCACGGGCGCTAGCCGGCGTAGCACTGGAGCGGCAGGAGCTACGCATGCTTG
15 RPVTGAI PRMTVEGDE I RVF
AGCTCGGGGCTGGCGCCGATCAGACGGTGCCCGTCGATGCCCGCCAGATACATGTACGGG
20521 + + + + + + 20580
TCGAGCCCCGAGCGCGGCTAGTCTGCCACGGGCAGCTACGGGCGGTCTATGTACATGCCC
15 LEPSAGILRHGDIGALYMYP-
GAGGCGTTCCGCCCGCGCAGGCGCTGGTAGACGTCCGCGGGGTCGGCCGTCGAGCGGATG
20581 + + + + + + 20640
CTCCGCAAGGCGGGCGCGTCCGCGACCATCTGCAGGCGCCCCAGCCGGCAGCTCGCCTAC
15 SANRGRLRQYVDAPDATSRI-
GAGAGCTCGTGAC CGATCTGCACCTGGTAGATGTCGC CGACGGCGATGTGCTTCAGACAC
20641 + + + + + + 20700
CTCTCGAGCACTGGCTAGACGTGGACCATCTACAGCGGCTGCCGCTACACGAAGTCTGTG
15 SLEHGIQVQYIDGVAIHKLC
CGCTCGACGTCGTTCGCGAACACTTCGGGGGCGCTGTCGTCGGTGACCGCGGAGGCGGGG
20701 + + + + + + 20760
GCGAGCTGCAGCAAGCGCTTGTGAAGCCCCCGCGACAGCAGCCACTGGCGCCTCCGCCCC
26
15 REVDNAFVEPASDDTVASAP
AAGCCGTCTGCGGACGGATCGGGCCAGGCCTGCTCCACGTCGGCGAGGAGCCCGGTGACG
20761 + + + + + + 20820
TTCGGCAGACGCCTGCCTAGCCCGGTCCGGACGAGGTGCAGCCGCTCCTCGGGCCACTGC
15 FGDAS PD PWAQEVDALLGTV
GTCTCCGGCGCGAGGCCGGGCCAGTACGGGGACTCGTGGAGCAGCAGTTCGCATCGGCCG
20821 + + + + + + 20880
CAGAGGCCGCGCTCCGGCCCGGTCATGCCCCTGAGCACCTCGTCGTCAAGCGTAGCCGGC
15 TEPALGPWYPSEHLLLECRG-
GTGGCGAGATCGGTGACCACGCTGCCCCGGTGCAGGACCATGCGTACGTCCGGCAGGCCA
20881 + + + + + + 20940
CACCGCTCTAGCCACTGGTGCGACGGGGCCACGTCCTGGTACGCATGCAGGCCGTCCGGT
15 TALDTVVSGRHLVMRVDPLG-
GGCCGGTTCTCGATGAGGTGGGGCAGGTCCTCGATGTAGCGGGCCGTGTCGTACCCGAAG
20941 + + + + + + 21000
CCGGCCAAGAGCTACTCCACCCCGTCCAGGAGCTACATCGCCCGGCACAGCATGGGCTTC
15 PRNEILHPLDEIYRATDYGF
AACCCGAGGAACCCGAAGCGGAAGCCGGACGCGGACCCCTCGGCGTCGAACATGTCCCGC
21001 + + + + + + 21060
TTGGGCTCCTTGGGCTTCGCCTTCGGCCTGCGCCTGGGGAGCCGCAGCTTGTACAGGGCG
15 FGLFGFRFGSASGEADFMDR
ATGGCCCGCAGCAGCGGCCACAACCCGCCCGCGGTACGCAGCCGCAGCCCCTGGGGGCCG
21061 + + + + + + 21120
TACCGGGCGTCGTCGCCGGTGTTGGGCGGGCGCCATGCGTCGGCGTCGGGGACCCCCGGC
15 MARLLPWLGGATRLRLGQPG-
TCCTCCAGGAGCGCGCCGGCCCGCTCCAGGAGCAGGCCCCGCAGGGCGGGTACGCCCTCG
21121 + + + + + + 21180
AGGAGGTCCTCGCGCGGCCGGGCGAGGTCCTCGTCCGGGGCGTCCCGCCCATGCGGGAGC
15 DELLAGARELLLGRLAPVGE-
ACGCGCACCACCCGGTCGGTGACCGAGAGCGAGAGCAGCGCGCCGAAGCCGACGAACTGG
2H81 + + + + + + 21240
TGCGCGTGGTGGGCCAGCCACTGGCTCTCGCTCTCGTCGCGCGGCTTCGGCTGCTTGACC
15 VRVVRDTVSLSLLAGFGVFQ
TGCCTGCGGTCGCGGGCCGGGCCGGCCGCGGACTCCAGGAGGTAGACCTCGTCGGGGCCG
21241 + + + - + + + 21300
ACGGACGCCAGCGCCCGGCCCGGCCGGCGCCTGAGGTCCTCCATCTGGAGCAGCCCCGGC
15 HRRDRAPGAASELLYVEDPG-
AAGTGCTCGGCCAGCGCGCGGTAGGCGGGCAGGGCGCCCGTCTCCTTCACATCGAGGCGT
21301 + + + + + + 21360
TTCACGAGCCGGTCGCGCGCCATCCGCCCGTCCCGCGGGCAGAGGAAGTGTAGCTCCGCA
15 FHEALARYAPLAGTEKVDLR
CGTGTCCGCACCCGCACCGGGGCCGAGACCACGCACTGGTCGGTCATCCTGGGTCCTCCC
21361 + + + + + + 21420
GCACAGGCGTGGGCGTGGCCCCGGCTCTGGTGCGTGACCAGCCAGTAGGACCCAGGAGGG
15-< RTRVRVPASVVCQDTM-
GGATCACGTGGTGATGGCGTAGCGGTGTGCCACCTGACGGGCGGTCAGCACCGCCCGGTC
21421 + + + + + + 21480
CCTAGTGCACCACTACCGCATCGCCACACGGTGGACTGCCCGCCAGTCGTGGCGGGCCAG
14-* * TTIAYRHAVQRATLVARD-
GGGGCCGGAGCGGTTGTCGACGACGCGCGCGGCCTTCCAGCTGACGAAGGAGCCGGTGTG
21481 + + + + + + 21540
CCCCGGCCTCGCCAACAGCTGCTGCGCGCGCCGGAAGGTCGACTGCTTCCTCGGCCACAC
14 PGSRNDVVRAAKWSVFSGTH-
27
GGTCACGGGGTCGAGGTCGGTGTCCACGACGATGCCGGCGTGCGCGCCGGTCCGCTCCCT
21541 + + + + + + 21600
CCAGTGCCCCAGCTCCAGCCACAGGTGCTGCTACGGCCGCACGCGCGGCCAGGCGAGGGA
14 TVPDLDTDVV I GAHAGTRER-
GAGCCGGGCGGCGACGGCCTCGCCGATGCCCTGCCGTTCCCCCTCGGCGCCGGCCAGCAG
21601 + + + + + + 21660
CTCGGCCCGCCGCTGCCGGAGCGGCTACGGGACGGCAAGGGGGAGCCGCGGCCGGTCGTC
14 LRAAVAEG I GQREGEAGALL-
GTCCATGCGCACGGTGACGGCGTCGCTGCCGTCGTCCTGCCGGTCGATGACGACCTGGTA
21661 + + + + + + 21720
CAGGTACGCGTGCCACTGCCGCAGCGACGGCAGCAGGACGGCCAGCTACTGCTGGACCAT
14 DMRVTVADSGDDQRDIVVQY-
GCCGAGGCAGCCGCCGACCCCGTCGAGGATCGCGGCCTCCAGCTCGGCGGGCTGGAGGGT
21721 + + + + + + 21780
CGGCTCCGTCGGCGGCTGGGGCAGCTCCTAGCGCCGGAGGTCGAGCCGCCCGACCTCCCA
14 GLCGGVGDLIAAELEAPQLT-
CACGTCGCCCAGGGGGATGCGGTCCGCGACCCGGCCGATGACCTGGATCCGCGGTCCCGG
21781 + + + + + + 21840
GTGCAGCGGGTCCCCCTACGCCAGGCGCTGGGCCGGCTACTGGACCTAGGCGCCAGGGCC
14 VDGLP I RDAVRG I VQ I RPGP-
CAGCGGCTCCCCGGGGCCCGCCGGGAGGATGCGGACCAGGTCCCCGGTGCGGTAGCGGAT
21841 + + + + + + 21900
GTCGCCGAGGGGCCCCGGGCGGCCCTCCTACGCCTGGTCCAGGGGCCACGCCATCGCCTA
14 LPEGPGAPLIRVLDGTRYRI-
CAGTGGTTTGATGCCGTCCACCAGCATGGTGAGGACGAGTTCGCCCTCTCCCGTGTCGCC
21901 + + + + + + 21960
GTCACCAAACTACGGCAGGTGGTCGTACCACTCCTGCTCAAGCGGGAGAGGGCACAGCGG
14 LPKIGDVLMTLVLEGEGTDG-
GACCACGGCGCCGGTGTCCGGTTCGACGAGTTCGGTCAAGTAGTTGGGCTGGGCGAGGTG
21961 + + + + + + 22020
CTGGTGCCGCGGCCACAGGCCAAGCTGCTCAAGCCAGTTCATCAACCCGACCCGCTCCAC
14 VVAGTDPEVLETLYNPQALH-
GAGCGCTCCGGTGTCCGCTCCGGTGGCGATGCACAGGGCTTCCTGGGAGCCGTAGAGCGT
22021 + + + + + + 22080
CTCGCGAGGCCACAGGCGAGGCCACCGCTACGTGTCCCGAAGGACCCTCGGCATCTCGCA
14 LAGTDAGTAI CLAEQSGYLT-
GGGCCGCACGACGGCTTGCGGCCAGAGGGTCGCCACGTTGTCGGCGAACTGCGGGGTGCA
22081 + + + + + + 22140
CCCGGCGTGCTGCCGAACGCCGGTCTCCCAGCGGTGCAACAGCCGCTTGACGCCCCACGT
14 PRVVAQPWLTAVNDAFQ PTC-
GATCTCACCCAGCGTGAGGAAGAGCTTCACGGGAAGCCGGGC CAGGT CGTAGCCGTAGTG
22141 + + + + + + 22200
CTAGAGTGGGTCGCACTCCTTCTCGAAGTGCCCTTCGGCCCGGTCCAGCATCGGCATCAC
14 IEGLTLFLKVPLRALDYGYH-
CAGGGCCGCCTTGGCAAGGCTCAGGCACAGCGCCGGAGCACAGACGACGACCTCGACCTC
22201 + + + + + + 22260
GTCCCGGCGGAACCGTTCCGAGTCCGTGTCGCGGCCTCGTGTCTGCTGCTGGAGCTGGAG
14 LAAKALS LCLAPACVVVEVE-
CAGCTCCTCGATCAGCCGCAGCGCCTTACGGAATCCCACCCTGGGGGACTCGGGCCAGAT
22261 + + + + + + 22320
GTCGAGGAGCTAGTCGGCGTCGCGGAATGCCTTAGGGTGGGACCCCCTGAGCCCGGTCTA
14 LEEILRLAKRFGVRPSEPWI-
28
CTTGACGTGACAGGCCCCCAGCTCCGCTGCCACCGCGGTGAACACGTCCCCGAACGCGTA
22321 + + + + + + 22380
GAACTGCACTGTCCGGGGGTCGAGGCGACGGTGGCGCCACTTGTGCAGGGGCTTGCGCAT
14 KVH CAGL EAAVAT FVDG F A Y -
CAGCTCCGACGGCCCCATCAGGCCCACGACGGGCATCCGCCCCCCGAACCTCGCTTCCAG
22381 + + + + + + 22440
GTCGAGGCTGCCGGGGTAGTCCGGGTGCTGCCCGTAGGCGGGGGGCTTGGAGCGAAGGTC
14 LESPGMLGVVPMRGGFRAEL-
CATGCGGCGCCAGGACTCCCGGACGGCGATGTTGCTGGTCGCGATGTCCTTCTCGCCGCG
22441 + + + + + + 22500
GTACGCCGCGGTCCTGAGGGCCTGCCGCTACAACGACCAGCGCTACAGGAAGAGCGGCGC
14 MRRWSERVAINSTAIDKEGR-
TGGGCACGGGGTGGCCGCCCCGGTGGTCCCGGTGGTCTCGTAGTAGATGCGTGCTTCGTG
22501 + + + + + + 22560
ACC CGTGC CCCAC CGGCGGGGCCACCAGGGC CAC CAGAGCATCATCTACGCACGAAGCAC
14 PCPTAAGTTGTTEYY I R A E H -
CAGCGGGCCCGACAGGACGTCGTGCATCTCCCGCCGCAGGTCGTCCTTGGTGGTGAAGGG
22561 + + + + + + 22620
GTCGCCCGGGCTGTCCTGCAGCACGTAGAGGGCGGCGTCCAGCAGGAACCACCACTTCCC
14 LPGSLVDHMERRLDDKTTFP-
CAGGTCCGCCAGGTTCGCGGGGGTGACGGCCTCGACGTCCACGCCTGCCAGATGGCGGCG
22621 + + + + + + 22680
GTCCAGGCGGTCCAAGCGCCCCCACTGCCGGAGCTGCAGGTGCGGACGGTCTACCGCCGC
14 LDALNAPTVAEVDVGALHRR-
GTAGAACGGCGAGCGGCGGGTGACGTGGCGCAGTACGGCCGTCAGCCGTTCGCCCTCCCA
22681 + + + + + + 22740
CATCTTGCCGCTCGCCGCCCACTGCACCGCGTCATGCCGGCAGTCGGCAAGCGGGAGGGT
14 YFPS RRTVHRLVATLREGEW-
GCGCTCGCGGTCGGCGGCGGTGAGTTCGCCGCGGTAGAACGCGTCGCTCACCTGCCCGTA
22741 + + + + + + 22800
CGCGAGCGCCAGCCGCCGCCACTCAAGCGGCGCCATCTTGCGCAGCGAGTGGACGGGCAT
14 RERDAATLEGRYFAD SVQGY-
GGCGGACCAGAACTCGCTGTCCGCGTCGGGGTCCAGCGGCCCGGTCCCGCCGGGACCGGG
22801 + + + + + + 22860
CCGCCTGGTCTTGAGCGACAGGCGCAGCCCCAGGTCGCCGGGCCAGGGCGGCCCTGGCCC
14 ASWFESDADPDLPGTGGPGP-
CCGCCGGCCGTCTCTCACGGCTGTGCCTGGAGTTCGTTGAGCGCGAGGCCGACCCGCTCG
22861 + + + + + + 22920
GGCGGCCGGCAGAGAGTGCCGACACGGACCTCAAGCAACTCGCGCTCCGGCTGGGCGAGC
14 -< RRGDRM-
21-* * PQAQLENLALGVRE -
TTGACCTCGTTGGAGGCCAGCACGTCCGAACGGCCGGTGAGCCGACGGTGTTCGTCGAGC
22921 + + + + + + 22980
AACTGGAGCAACCTCCGGTCGTGCAGGCTTGCCGGCCACTCGGCTGCCACAAGCAGCTCG
21 NVENSALVDSRGTLRRHEDL
AGTTCGATCATGTCCGTCATCCTCTCGACCAGGCGCGAGACGTTGGTGAGGCCCTCCTCG
22981 + + + + + + 23040
TCAAGCTAGTACAGGCAGTAGGAGAGCTGGTCCGCGCTCTGCAACCACTCCGGGAGGAGC
21 LEIMDTMREVLRSVNTLGEE
TCCTTGAGCGCGTCGCCCCGGTGCAGCGCGTGCACCGTCGCCGGGAAGCCGCTGCCCACC
23041 + + + + + + 23100
AGGAACTCGCGCAGCGGGGCCACGTCGCGCACGTGGCAGCGGCCCTTCGGCGACGGGTGG
21 DKLADGRHLAHVTAPFGSGV
29
AGGATCATCCGGTTGAGCAGGGCATTGACGGTCAGCTGAGCCCATACCTCGCCGGCGCTG
23101 + + + + + + 23160
TCCTAGTAGGCCAACTCGTCCCGTAACTGCCAGTCGACTCGGGTATGGAGCGGCCGCGAC
21 LIMRNLLANVTLQAWVEGAS-
TAGCGGCGGGCGACCGAGATGATCCCCGCGACCTTGTTGCTCAGCGGCCGGTCGAAGCGC
23i5i + + + + + + 23220
ATCGCCGCCCGCTGGCTCTACTAGGGGCGCTGGAACAACGAGTCGCCGGCCAGCTTCGCG
21 YRRAVS I IGAVKNSLPRDFR-
AGATAACCGACTCCGGCACGCTCGATGAAGGTCTGCATGAGGCTGGCCGTGCCGAATCCG
23221 i — — i 1 1 1 h 23280
TCTATTGGCTGAGGCCGTGCGAGCTACTTCCAGACGTACTCCGACCGGCACGGCTTAGGC
21 LYGVGARE I FTQMLSATGFG
TGCACGGGCGCCGCGAAGATGATCCCGTCCGCCGCGACCATCTTCGCCACGACCTCGGGC
2328I + + + + + + 23340
ACGTGCCCGCGGCGCTTCTACTAGGGCAGGCGGCGCTGGTAGAAGCGGTGCTGGAGCCCG
21 HVPAAFI IGDAAVMKAVVEP
ACCCCGTCGGCCAGGGTGCAGGCCACCGGCCTGTCGTTGCAGTCCCCGCAGGGCCCGCAC
23341 + + + + + + 23400
21 VGDALTCAVPRDNCDGCPGC
CGCTCCATCCTGATCGAGCGCAGGTCGACGGCCTCGAAGTCGACGCCGCGGTTCTCTGCT
23401 + + + + + + 23460
GCGAGGTAGGACTAGCTCGCGTCCAGCTGCCGGAGCTTCAGCTGCGGCGCCAAGAGACGA
21 REMRI SRLDVAEFDVGRNEA-
ACGCGTGCCGCGTGCCGCAGTACGTCGGCGGTGTTGCCGTCACGTTCCGAACCGTTGATC
234 6 i + + + + + + 23520
TGCGCACGGCGCACGGCGTCATGCAGCCGCCACAACGGCAGTGCAAGGCTTGGCAACTAG
21 VRAAHRLVDATNGDRESGNI
GCGAGGATCTTGAGTTGTGCGCTCACGAGGGGCCTCCTTGGTGAGTCAGGTGCGCTCGGC
23521 + + + + + + 23580
CGCTCCTAGAACTCAACACGCGAGTGCTCCCCGGAGGAACCACTCAGTCCACGCGAGCCG
!3_* * T R E A -
21-< ALIKLQASM-
GGTCGGCTCGGGGGAACTGTCTGGCCGCCGCTGGTCCGGGAGCCGCAGGGCCGGCTCGGC
23581 + + + + + + 23640
CCAGCCGAGCCCCCTTGACAGACCGGCGGCGACCAGGCCCTCGGCGTCCCGGCCGAGCCG
13 TPEPSSDPRRQDPLRLAPEA-
GGGGGCGGGAGGAAGACCGCCCCGCGGCGGGCCGCCACGCTCGCCGAACCGGATGAGGGG
23641 + + + + + + 23700
CCCCCGCCCTCCTTCTGGCGGGGCGCCGCCCGGCGGTGCGAGCGGCTTGGCCTACTCCCC
13 PAP PLGGRPPGGREGFRI LP-
CTTCTCGACGAGATAGAAGCTGATGGTCGCCAGCACGACGCTGATCGAGATCGTGAAGAG
23701 i | | 1 1 + 23760
GAAGAGCTGCTCTATCTTCGACTACCAGCGGTCGTGCTGCGACTAGCTCTAGCACTTCTC
13 KEVLYFS ITALVVS I S ITFL-
GAACAGTTCCCAGAACCCCATGTCACCCCGGAATTCCGGCGTTGGCACGGGAGACTTGCC
237 6 i + + + + + + 23820
CTTGTCAAGGGTCTTGGGGTACAGTGGGGCCTTAAGGCCGCAACCGTGCCCTCTGAACGG
13 FLEWFGMDGRFE PTPVPSKG-
GAAGATGCTGCCGTTCCTGAGCCAGAGGTTGATCACGATCTCGTGCCAGAGGTAGACGCC
23821 + + + + + + 23880
CTTCTACGACGGCAAGGACTCGGTCTCCAACTAGTGCTAGAGCACGGTCTCCATCTGCGG
13 FISGNRLWLNIVIEHWLYVG-
GAGGGAGATCTGGCCGAGGAAGAGGATCGGCTTGCTGGTGAAGAGCGCGTCCGAGAACCG
30
23881 + + + + + + 23940
CTCCCTCTAGACCGGCTCCTTCTCCTAGCCGAACGACCACTTCTCGCGCAGGCTCTTGGC
13 LS IQGLFLI PKSTFLADS F R -
GGACTCGGCGCCGGGGACCGTCATCGGTGCCAGGAGCAGCAGGGTGAAGGAGGTCAGGAT
23941 + + + + + + 24000
CCTGAGCCGCGGCCCCTGGCAGTAGCCACGGTCCTCGTCGTCCCACTTCCTCCAGTCCTA
13 S EAGPVTMPALLLLTF STL I -
GAAGTGGTCGACGAGCTCCTGGGCCAGGGCCGCGTTGTCGCCCATGCCCGGGATGCCGAT
24001 + + + + + + 24060
CTTCACCAGCTGCTCGAGGACCCGGTCCCGGCGCAACAGCGGGTACGGGCCCTACGGCTA
13 FHDVLEQALAANDGMGP I G I -
GGGCTTGGTGGCGTAGAGGAGGTACAGCGGGATGAGCGGGACCCAGCAGATCAGCGGGCG
24061 + + + + + + 24120
CCCGAACCACCGCATCTCCTCCATGTCGCCCTACTCGCCCTGGGTCGTCTAGTCGCCCGC
13 PKTAYliLYLP ILPVWCILPR-
CCGGATCACGAAACGGTAGAAGCCCGGGGTCCCTGGCGTCGCCTCGGCGTACGCGGAGTA
24121 + + + + + + 24180
GGCCTAGTGCTTTGCCATCTTCGGGCCCCAGGGACCGCAGCGGAGCCGCATGCGCCTCAT
13 RIVFRYFGPTGPTAEAYASY-
GATGGCCAGTGCCATGCCCGCGGCGAAGCAGCCGGCGTAGTAGGGCGGCCAGTACCACTG
24181 + + + + + + 24240
CTACCGGTCACGGTACGGGCGCCGCTTCGTCGGCCGCATCATCCCGCCGGTCATGGTGAC
13 IALAMGAAFCGAYY P PWYWQ-
CATCGTCGCGCCGGTGGAGGGGAGGTTGGTGTACGTGACCCAGCCGATGGCCATGACTTC
24241 + + + + + + 24300
GTAGCAGCGCGGCCACCTCCCCTCCAACCACATGCACTGGGTCGGCTACCGGTACTGAAG
13 MTAGTS PLNTYTVWG IAMVE-
CAGCGCGGCCAGCGGCAGCAGGAGGCGGCGTGCCTTCTGCCCGGGAGTGCTGCCGCCCCG
24301 + + + + + + 24360
GTCGCGCCGGTCGCCGTCGTCCTCCGCCGCACGGAAGACGGGCCCTCACGACGGCGGGGC
13 LAAL PLLLRRAKQGPT SGGR-
CGCGAGCCGGTGGCCGATCCAGGCGATCAGCGGCAGGGCGAGGTAGAACGTGAACTCGGC
24361 + + + + + + 24420
GCGCTCGGCCACCGGCTAGGTCCGCTAGTCGCCGTCCCGCTCCATCTTGCACTTGAGCCG
13 ALRHGIWAILPLALYFTFEA-
GGGGACCGTCCAGGTGGGCTCGATGCCGTGCATCGGCTGGCCCTCGGGCAGATAGAAGTG
24421 + + + + + + 24480
CCCCTGGCAGGTCCACCCGAGCTACGGCACGTAGCCGACCGGGAGCCCGTCTATCTTCAC
13 PVTWTPE IGHMPQGEPLYFH-
CATGAGCAGCACGGGCCGCAGGACGTCGCTGACGCTGTCGATCTCGAACCAGTTGTAGCC
24481 + + + + + + 24540
GTACTCGTCGTGCCCGGCGTCCTGCAGCGACTGCGACAGCTAGAGCTTGGTCAACATCGG
13 MLLVPRLVDSVSDIEFWNYG-
GGGGATTGCG7^AGACGAGCAACAGGTAGTAGGCGGGCAGGATGCGCAGGGCCCGGCGTTT
24541 + + + + + + 24600
CCCCTAACGCTTCTGCTCGTTGTCCATCATCCGCCCGTCCTACGCGTCCCGGGCCGCAAA
13 P IAFVLLLYYAPLI RLARRK-
GAGGAACCGTCCGGTGGCGGGCCGCTTCGTCCCACTGATGGTGACGCGGGCGTAGGGCTT
24601 + + + + + + 24660
CTCCTTGGCAGGCCACCGCCCGGCGAAGCAGGGTGACTACCACTGCGCCCGCATCCCGAA
13 LFRGTAPRKTGS I TVRAYPK-
GTACAGCATCATTCCGGACAGAGCGAAGAAGGGGGAAGGCATACCCCCAGACCGTCCGCG
24661 + + + + + + 24720
31
CATGTCGTAGTAAGGCCTGTCTCGCTTCTTCCCCCTTCCGTATGGGGGTCTGGCAGGCGC
13 -< YLMMGSLAFFPSPM-
AGGACGCCCCAGAACGGTTTGCCCGGCTCACCGACGAAGCTGCCCACTCCGGCCTGGAAG
24721 + + + + + + 24780
TCCTGCGGGGTCTTGCCAAACGGGCCGAGTGGCTGCTTCGACGGGTGAGGCCGGACCTTC
GCGACGTGGTAGACGACCACACCCAGCGCGAGGACACCTCGCAGTCCCTCGAACTTCGGT
24781 + + + + + + 24840
CGCTGCACCATCTGCTGGTGTGGGTCGCGCTCCTGTGGAGCGTCAGGGAGCTTGAAGCCA
ATTCGCTTGCTTTTTGCGCCACCTGCGTCGCGAAGGACGTCCCCCATGGAACAGTCCCCT
24841 + + + + + + 24900
TAAGCGAACGAAAAACGCGGTGGACGCAGCGCTTCCTGCAGGGGGTACCTTGTCAGGGGA
TTCCCTTGGCACTTGCTCGTTGACTTCCCGAAATAGTCGGGTCTGCGGAGTGTGAGCCGC
24901 + + - + + + + 24960
AAGGGAACCGTGAACGAGCAACTGAAGGGCTTTATCAGCCCAGACGCCTCACACTCGGCG
ATCTCCAATCGTGCTGTTCCGGTGCTCAGGACGACTTGTTTCGGCCTGAGTGGGAAGGCA
24961 + + + + + + 25020
TAGAGGTTAGCACGACAAGGCCACGAGTCCTGCTGAACAAAGCCGGACTCACCCTTCCGT
12-* *SSKNRGSHSP
GCCACCCCCGCCGCCCCGCCTCGGCCAGACCGGGGGCCGAGGAGTCCCGTTCCGAGAGGA
25021 + + + + + + 25080
CGGTGGGGGCGGCGGGGCGGAGCCGGTCTGGCCCCCGGCTCCTCAGGGCAAGGCTCTCCT
12 LWGRRGAEALGPASSDRESL
TCGGAGTGATCTCCGGCGGCCAGGCGATGCCCACCTCCGGATCCAGCGGATTCAAGCCAT
25081 + + + + + + 25140
AGCCTCACTAGAGGCCGCCGGTCCGCTACGGGTGGAGGCCTAGGTCGCCTAAGTTCGGTA
12 IPTIEPPWAIGVEPDLPNLG
GTTCGAGCCGGGGGTCGTAGGCCGCCGAGCACAGGTAGACGATCACCGCCTCGTCGCTCA
25141 + + + + + + 25200
CAAGCTCGGCCCCCAGCATCCGGCGGCTCGTGTCCATCTGCTAGTGGCGGAGCAGCGAGT
12 HELRPDYAASCLYVIVAEDS
GCGTGAGGAATCCGAAGCCCAGCCCCGCGGAGACGTACAGCGCCCGTCCGTTCTCCTCGC
25201 + + + + + + 25260
CGCACTCCTTAGGCTTCGGGTCGGGGCGCCTCTGCATGTCGCGGGCAGGCAAGAGGAGCG
12 LTLFGFGLGASVYLARGNEE
CGAGCTCCACGGTCCGCCAGCCGCCGAAGGTGGGCGACCCCACCCGGATGTCGACCACGG
25261 + + + + + + 25320
GCTCGAGGTGCCAGGCGGTCGGCGGCTTCCACCCGCTGGGGTGGGCCTACAGCTGGTGCC
12 GLEVTRWGGFTPSGVRI DVV
CGCCGAACACGCTGCCGCGCAGGCAGCTGAAGTACTTGGCCTGGCCGGGTACGCCCCCGG
25321 + + + + + + 25380
GCGGCTTGTGCGACGGCGCGTCCGTCGACTTCATGAACCGGACCGGCCCATGCGGGGGCC
12 AGFVSGRLCSFYKAQGPVGG
CGAAGTGGATGCCCCGCAGCACCCCGTGGGAGGAGATCGCGCAGTTCGCCTGCCGCAGGT
25381 + + + + + + 25440
GCTTCACCTACGGGGCGTCGTGGGGCACCCTCCTCTAGCGCGTCAAGCGGACGGCGTCCA
12 AFHIGRLVGHSSIACNAQRL
CGAAGGAGTGGCCTACGGTGCGGCGGAAGGGCTCGCCCTGGAACCACTCGCGAAACGAGC
25441 + + + + + + 25500
GCTTCCTCACCGGATGCCACGCCGCCTTCCCGAGCGGGACCTTGGTGAGCGCTTTGCTCG
12 DFSHGVTRRFPEGQFWERFS
CCCGTTCGTCACGGAAGACCTGCTTCTCCTCCGTCCACGCTCCCGAGATCCCGATCGGCT
25501 + + + + + + 25560
32
GGGCAAGCAGTGCCTTCTGGACGAAGAGGAGGCAGGTGCGAGGGCTCTAGGGCTAGCCGA
GREDRFVQKEETWAGS I GI P
TCATCGCTGGCCCCTTCTCTCGACTTCTCTCGACGACTCGCGGGAGGCGGCCGAGGGGTC
25561 + + + + + + 25620
AGTAGCGACCGGGGAAGAGAGCTGAAGAGAGCTGCTGAGCGCCCTCCGCCGGCTCCCCAG
< K M -
CGCCGGGCCCGTGGGAACGCCGCAGTCTAGATGCGGCGGCACCGGGGGCAGGGGGGTGCG
25621 + + + + + + 25680
GCGGCCCGGGCACCCTTGCGGCGTCAGATCTACGCCGCCGTGGCCCCCGTCCCCCCACGC
GACGACGTCCGCCCCACCTCAGCACACCGGGAGATGCAGGTCGGTGACGGGCGACGTGAC
25681 + + + + + + 25740
CTGCTGCAGGCGGGGTGGAGTCGTGTGGCCCTCTACGTCCAGCCACTGCCCGCTGCACTG
GATGCAACGGTCCGAGGCCCGGTTGCCCGGACGACGGCCCACAGAGCCATCGGAGCAACG
25741 + + + + + + 25800
CTACGTTGCCAGGCTCCGGGCCAACGGGCCTGCTGCCGGGTGTCTCGGTAGCCTCGTTGC
GAGGCGGACCGCAGATGACCAAGCACGCCCGTGACCGCGCGGTAGTCCTCGGCGCAGGGA
25801 + + + + + + 25860
CTCCGCCTGGCGTCTACTGGTTCGTGCGGGCACTGGCGCGCCATCAGGAGCCGCGTCCCT
.> MTKHARDRAVVLGAGM-
TGGCGGGGCTGCTCGCCGCGCGCGTCCTGTCCGAGACGTACAAGGAAGTGCTGGTGATCG
25861 + + + + + + 25920
ACCGCCCCGACGAGCGGCGCGCGCAGGACAGGCTCTGCATGTTCCTTCACGACCACTAGC
AGLLAARVLSETYKEVLVID-
ACCGGGACCGGTTGGGCGGCACGGAGCAGCGCCGCGGTGTCCCGCACGGACGCCACGCCC
25921 + + + + + + 25980
TGGCCCTGGCCAACCCGCCGTGCCTCGTCGCGGCGCCACAGGGCGTGCCTGCGGTGCGGG
RDRLGGTEQRRGVPHGRHAH-
ATGCGCTGCTGGCCAAGGGACAGCAGATCCTCAACGAACTCTTCCCCGGACTCGACACCG
2598I + + + + + + 26040
TACGCGACGACCGGTTCCCTGTCGTCTAGGAGTTGCTTGAGAAGGGGCCTGAGCTGTGGC
ALLAKGQQI LNELFPGLDTE-
AACTCACCTCGGCCGGAATCCCCGCCGGGGACATCGCCGGGAACCTGCGGTGGTACTTCA
26041 + + + + + + 26100
TTGAGTGGAGCCGGCCTTAGGGGCGGCCCCTGTAGCGGCCCTTGGACGCCACCATGAAGT
LTSAGI PAGD IAGNLRWYFN-
ACGGCCGCCGGCTCCAGCCCTTCGACACCGGGCTGATCAGCGTCTCGGCGACGAGGCCCG
26101 + + + + + + 26160
TGCCGGCGGCCGAGGTCGGGAAGCTGTGGCCCGACTAGTCGCAGAGCCGCTGCTCCGGGC
GRRLQPFDTGLI SVSATRPE-
AGCTGGAGTCCCACGTGCGCGCACGGGTCGCCGCGCTGCCACAGGTGAAGATCATGGACG
26161 + + + + + + 26220
TCGACCTCAGGGTGCACGCGCGTGCCCAGCGGCGCGACGGTGTCCACTTCTAGTACCTGC
L E SHVRARVAAL PQVKI MDG-
GGTGCGTGATCCGGGGCCTGACCGCCTCGGCCGACCGCAGCCGCGTCACCGGTGTCGAGG
26221 j | — | -\ — — i — — h 26280
CCACGCACTAGGCCCCGGACTGGCGGAGCCGGCTGGCGTCGGCGCAGTGGCCACAGCTCC
CV I RGLTASADRS RVTGVEV-
TGGTCGACGAGTCGGGTACGGACACCCCGACGCGCCTGGAGGCCGACCTCGTCGTCGACG
26281 + + + + + + 26340
ACCAGCTGCTCAGCCCATGCCTGTGGGGCTGCGCGGACCTCCGGCTGGAGCAGCAGCTGC
VDESGTDTPTRLEADLVVDV-
TCACGGGGCGCGGCTCGCGGACTCCCGCCTGGCTGGAGGAGTTCGGATACGAGCGGCCCG
33
26341 + + + + + + 26400
AGTGCCCCGCGCCGAGCGCCTGAGGGCGGACCGACCTCCTCAAGCCTATGCTCGCCGGGC
20 TGRGSRTPAWLEEFGYERPA-
CGGAGGACCGCTTCAAGATCGATCTGGCGTACACCACGCGCCACTTCAAGCTCAAGGAAG
26401 + + + + + + 26460
GCCTCCTGGCGAAGTTCTAGCTAGACCGCATGTGGTGCGCGGTGAAGTTCGAGTTCCTTC
20 EDRFKIDLAYTTRHFKLKED-
ACCCCTACGGCACGGACCTGTCGATCAACCCGGTGGCATCGCCGAGCAACCCGCGCGGCG
26461 + + + + + + 26520
TGGGGATGCCGTGCCTGGACAGCTAGTTGGGCCACCGTAGCGGCTCGTTGGGCGCGCCGC
20 PYGTDLS I NPVAS PSNPRGA-
CGTTCTTCCCCCGGCTCGCGGACGGCAGCTCCCAGCTCTCCCTCACCGGAATCCTCGGCG
26521 + + + + + + 26580
GCAAGAAGGGGGCCGAGCGCCTGCCGTCGAGGGTCGAGAGGGAGTGGCCTTAGGAGCCGC
20 FFPRLADGS SQLSLTG I LGD-
ACCACCCGCCCACCGACGACGAGGGCTTCCTGGCGTTCGCCAAGTCGCTTGCCGCGCCGG
26581 + + + + + + 26640
TGGTGGGCGGGTGGCTGCTGCTCCCGAAGGACCGCAAGCGGTTCAGCGAACGGCGCGGCC
20 HP PTDDEGFLAFAKS LAAPE-
AGATCTACCGGGCCGTCCGCGATGCCGAACCTCTCGACGAACCGGTCACCTTCCGCTTCC
26641 + + + + + + 26700
TCTAGATGGCCCGGCAGGCGCTACGGCTTGGAGAGCTGCTTGGCCAGTGGAAGGCGAAGG
20 I YRAVRDAEPLDEPVTFRFP-
CGGCGAGCGTCCGCCGCCGTTACGAGAGGCTGCGCCGTTTCCCCGGCGGGTTCCTCGTCA
26701 + + + + + + 26760
GCCGCTCGCAGGCGGCGGCAATGCTCTCCGACGCGGCAAAGGGGCCGCCCAAGGAGCAGT
20 ASVRRRYERLRRFPGGFLVM-
TGGGCGACGGCGTGTGCAGCTTCAACCCCGTCTACGGCCAGGGCATGACGGTCGCCGCCC
26761 + + + + + + 26820
ACCCGCTGCCGCACACGTCGAAGTTGGGGCAGATGCCGGTCCCGTACTGCCAGCGGCGGG
20 GDGVC S FNPVYGQGMTVAAL-
TGGAGGCCGTGGCGCTGCGGGACCACTTGCGCGACGCCCCGGACCCCGACGCCCTGCGCT
26821 + + + + + + 26880
ACCTCCGGCACCGCGACGCCCTGGTGAACGCGCTGCGGGGCCTGGGGCTGCGGGACGCGA
20 EAVALRDHLRDAPDPDALRF-
TCTTCCGGCGTATCTCCACGGTCATCGACGTTCCGTGGGACATCGCCGCCGGAGCGGATC
26881 + + + + + + 26940
AGAAGGCCGCATAGAGGTGCCAGTAGCTGCAAGGCACCCTGTAGCGGCGGCCTCGCCTAG
20 FRRI STVIDVPWDIAAGADL-
TGAACTTCCCCGGGGTGGAGGGCCCCCGCACCATGAAGGTGAAGATGGCCAACGCCTACA
26941 + + + + + + 27000
ACTTGAAGGGGCCCCACCTCCCGGGGGCGTGGTACTTCCACTTCTACCGGTTGCGGATGT
20 NF PGVEG PRTMKVKMANAYM-
TGGCCCGCCTGCACGCAGCGGCAGCCGTCGACGGCGCGGTGACCGGGGCGTTCTTCCGGG
27001 + + + + + + 27060
ACCGGGCGGACGTGCGTCGCCGTCGGCAGCTGCCGCGCCACTGGCCCCGCAAGAAGGCCC
20 ARLHAAAAVDGAVTGAF FRV-
TGGCCGGGCTGGTGGACCCCCCGCAGGCCCTGATGCGCCCCTCCCTCGCCCTGCGGGTCA
27061 + + + - - + + + 27120
ACCGGCCCGACCACCTGGGGGGCGTCCGGGACTACGCGGGGAGGGAGCGGGACGCCCAGT
20 AGLVD P PQALMR P S LAL RVM-
TGCGCAACTCCTCGGCGAAGCCGTCGGTCCCTTCGGGCGCCGCCGTATGACCGCGCGGCC
27121 + + + + + + 27180
34
ACGCGTTGAGGAGCCGCTTCGGCAGCCAGGGAAGCCCGCGGCGGCATACTGGCGCGCCGG
20-* RNS SAKPSVPSGAAV*-
CGTCCGGGGCGGCTGCCGGGGCCAGGAGCCGACATGCGGGTGATGATCACGGTGTTCCCG
27181 + + + + + + 27240
GCAGGCCCCGCCGACGGCCCCGGTCCTCGGCTGTACGCCCACTACTAGTGCCACAAGGGC
19 MRVMITVFP
GCGCGGGCGCACTTCCTGCCGCTGGTGCCCTATGCCTGGGCCCTGCAGAGCGCGGGCCAC
27241 + + + + + + 27300
CGCGCCCGCGTGAAGGACGGCGACCACGGGATACGGACCCGGGACGTCTCGCGCCCGGTG
19 ARAHFLPLVPYAWALQSAGH
GAGGTATGTGTCGTGGCGCCCCCGGGCTATCCCACCGGGGTGGCCGACCCCGACTTCCAC
27301 + + + + + + 27360
CTCCATACACAGCACCGCGGGGGCCCGATAGGGTGGCCCCACCGGCTGGGGCTGAAGGTG
19 EVCVVAP PGYPTGVADPDFH
GAGGCCGTCACCGCGGCCGGCCTGAAGTCGGTGACCTGCGGGCAGCCGCAGCCGCTGGCG
27361 + + + + + + 27420
CTCCGGCAGTGGCGCCGGCCGGACTTCAGCCACTGGACGCCCGTCGGCGTCGGCGACCGC
19 EAVTAAGLKSVTCGQPQPLA
GTCCACGACCGCGACGACCCCGGCTACGCGGCGATGCTGCCGACCGCGGCGGAGTCGGAG
27421 + + + + + + 27480
CAGGTGCTGGCGCTGCTGGGGCCGATGCGCCGCTACGACGGCTGGCGCCGCCTCAGCCTC
19 VHDRDDPGYAAMLPTAAESE
CGCTACGTGGCGGCCCTCGGGATCAGCGAGAAGGAGCGCCCCACCTGGGACGTCTTCTAC
27481 + + + + + + 27540
GCGATGCACCGCCGGGAGCCCTAGTCGCTCTTCCTCGCGGGGTGGACCCTGCAGAAGATG
19 RYVAALGISEKERPTWDVFY
CACTTCACCTTGCTGGCGATCCGCGACTACCATCCGCCGCGGCCGCGGCAGGACGTGGAC
27541 + + + + + + 27600
GTGAAGTGGAACGACCGCTAGGCGCTGATGGTAGGCGGCGCCGGCGCCGTCCTGCACCTG
19 HFTLLAIRDYHPPRPRQDVD
CAGGTGATCGAGTTCGCCCGGATCTGGCAGCCCGATCTGGTGCTGTGGGACGCCTGGTTC
27601 + + + + + + 27660
GTCCACTAGCTCAAGCGGGCCTAGACCGTCGGGCTAGACCACGACACCCTGCGGACCAAG
19 QVIEFARIWQPDLVLWDAWF
CCCTCGGGCGCGATCGCGGCGCGGGTCAGCGGCGCCGCGCACGCGCGGGTGCTCGTAGCC
27661 + + + + + + 27720
GGGAGCCCGCGCTAGCGCCGCGCCCAGTCGCCGCGGCGCGTGCGCGCCCACGAGCATCGG
19 PSGAIAARVSGAAHARVLVA
CCCGACTACACCGGCTGGGTCACCGAGCGGTTCGCCGCCGCGGGCCCCGCGGCGGGGGCC
27721 + + + + + + 27780
GGGCTGATGTGGCCGACCCAGTGGCTCGCCAAGCGGCGGCGCCCGGGGCGCCGCCCCCGG
19 PDYTGWVTERFAAAGPAAGA
GACCTCCTGGCCGAGACGATGCGGCCGCTGGCCGAGCGGTACGGCGTGGAGGTCGACGAC
27781 + + + + + + 27840
CTGGAGGACCGGCTCTGCTACGCCGGCGACCGGCTCGCCATGCCGCACCTCCAGCTGCTG
19 DLLAETMRPLAERYGVEVDD
GATCTTCTGCTCGGACAGTGGACGGTCAATCCGTTCCCGGCGCCGATGAACCCGCCGACC
27841 + + + + + + 27900
CTAGAAGACGAGCCTGTCACCTGCCAGTTAGGCAAGGGCCGCGGCTACTTGGGCGGCTGG
19 DLLLGQWTVNPFPAPMNPPT
CGGCTCACGAACGTTCCGGTGCGCTACGTGCCCTACACCGGTGCCAGCGTCATGCCCGCG
27901 + + + + + + 27960
GCCGAGTGCTTGCAAGGCCACGCGATGCACGGGATGTGGCCACGGTCGCAGTACGGGCGC
35
RLTNVPVRYVPYTGASVMPA
TGGCTGTACGCGCGGCCGTCGCGGCCGCGGGTGGCGCTGTCGCTCGGAGTGTCCGCGCGG
2796I + + + + + + 28020
ACCGACATGCGCGCCGGCAGCGCCGGCGCCCACCGCGACAGCGAGCCTCACAGGCGCGCC
WLYARPSRPRVALSLGVSAR
GCGTTCCTCAAGGGTGACTGGGGGCGTACCGCCAAACTGCTGGAAGCGGTCGCGGAGCTG
28021 + + + + + + 28080
CGCAAGGAGTTCCCACTGACCCCCGCATGGCGGTTTGACGACCTTCGCCAGCGCCTCGAC
AFLKGDWGRTAKLLEAVAEL
GACATCGAGGTGATCGCCACGCTCAACGACAACCAACTGGCGGAGAGCGGGCCGCTGCCG
28081 + + + + + + 28140
CTGTAGCTCCACTAGCGGTGCGAGTTGCTGTTGGTTGACCGCCTCTCGCCCGGCGACGGC
DIEVIATLNDNQLAESGPLP
GACAACGTCCACACCCTCGACTACGTACCGCTCGACCAGTTGCTGCCCACCTGCTCGGCC
28141 + + + + + + 28200
CTGTTGCAGGTGTGGGAGCTGATGCATGGCGAGCTGGTCAACGACGGGTGGACGAGCCGG
DNVHTLDYVPLDQLLPTCSA
GTCATCCACCACGGATCGACGGGCACCTTCGCCGCGGCGAGCGCGGCCGGGCTGCCCCAG
28201 + + + + + + 28260
CAGTAGGTGGTGCCTAGCTGCCCGTGGAAGCGGCGCCGCTCGCGCCGGCCGGACGGGGTC
VIHHGSTGTFAAASAAGLPQ
GTGGTCTGCGACACCGACGAGCCCCTCCTGCTCTTCGGCGAGGACACCCCCGACGGCATC
28 261 + + + + + + 28320
CACCAGACGCTGTGGCTGCTCGGGGAGGACGAGAAGCCGCTCCTGTGGGGGCTGCCGTAG
VVCDTDEPLLLFGEDTPDGI
GCGTGGGACTTCACCTGCCAGAAGCAGCTCACCGCGACGCTCACCTCCCGCGTGGTCACC
28321 + + + + + + 28380
CGCACCCTGAAGTGGACGGTCTTCGTCGAGTGGCGCTGCGAGTGGAGGGCGCACCAGTGG
AWDFTCQKQLTATLTSRVVT
GACTACGGGGCGGGGGTGCGCGTCGACCACCAGAAGCAGTCCGCCGGACAGATCCGTGAG
28381 + + + + + + 28440
CTGATGCCCCGCCCCCACGCGCAGCTGGTGGTCTTCGTCAGGCGGCCTGTCTAGGCACTC
DYGAGVRVDHQKQSAGQ I RE
CAACTACGCAGGGTGCTCACCGAACCTTCCTTCCGCGAGGGCGCTCGACGGATCCGGGAA
28441 + + + + + + 28500
GTTGATGCGTCCCACGAGTGGCTTGGAAGGAAGGCGCTCCCGCGAGCTGCCTAGGCCCTT
QLRRVLTEPSFREGARRIRE
GACCGGAATTCCGCCCCCAGCCCGGTCGAACTCGTATCGCTCCTGGTAGAACTGACGAAG
28501 + + + + + + 28560
CTGGCCTTAAGGCGGGGGTCGGGCCAGCTTGAGCATAGCGAGGACCATCTTGACTGCTTC
DRNSAPSPVELVSLLVELTK
CGTCATCGCCGTGACAAGGAGGCGGACCGATGAGGATGCTGGTGACGGGCGGAGCGGGTT
28561 + + + + + + 28620
GCAGTAGCGGCACTGTTCCTCCGCCTGGCTACTCCTACGACCACTGCCCGCCTCGCCCAA
-* RHRRDKEADR*-
> MRMLVTGGAGF-
TCATCGGCTCGCAGTTCGTGCGGGCCACACTGCACGGCGAGCTGCCGGGTTCCGAGGACG
2 8 621 + + + + + + 28680
AGTAGCCGAGCGTCAAGCACGCCCGGTGTGACGTGCCGCTCGACGGCCCAAGGCTCCTGC
IGSQFVRATLHGELPGSEDA-
CCCGGGTGACGGTCCTGGACAAGCTGACGTACTCCGGCAATCCGGCCAACCTCACCTCCG
28681 + + + + + + 28740
GGGCCCACTGCCAGGACCTGTTCGACTGCATGAGGCCGTTAGGCCGGTTGGAGTGGAGGC
36
1 RVTVLDKLTYSGNPANLTSV-
TCGCGGCCCATCCGCGGTACACCTTCGTCCAGGGCGACACCGTCGACCCGCGCGTCGTCG
28741 + + + + + + 28800
AGCGCCGGGTAGGCGCCATGTGGAAGCAGGTCCCGCTGTGGCAGCTGGGCGCGCAGCAGC
1 AAH PRYTFVQGDTVD P RVVD-
ACGAGGTGGTCGCCGGCCACGACGTCATCGTCCACTTCGCGGCGGAGTCGCACGTGGACC
28801 + + + + + + 28860
TGCTCCACCAGCGGCCGGTGCTGCAGTAGCAGGTGAAGCGCCGCCTCAGCGTGCACCTGG
1 EVVAGHDVIVHFAAES HVDR-
GCTCGATCGACACCGCCACCCGGTTCGTCACGACCAACGTGCTCGGGACCCAGACGCTGC
28861 + + + + + + 28920
CGAGCTAGCTGTGGCGGTGGGCCAAGCAGTGCTGGTTGCACGAGCCCTGGGTCTGCGACG
1 SIDTATRFVTTNVLGTQTLL-
TGGAAGCGGCTCTCCGGCACGGGGTCGGCCGGTTCGTGCACGTGTCGACCGACGAGGTCT
28921 + + + + + + 28980
ACCTTCGCCGAGAGGCCGTGCCCCAGCCGGCCAAGCACGTGCACAGCTGGCTGCTCCAGA
1 EAALRHGVGRFVHVS TDEVY-
ACGGGTCGATCGCCTCCGGCTCATGGACCGAGGACACCCCGCTCGCCCCCAACGTCCCCT
28981 + + + + + + 29040
TGCCCAGCTAGCGGAGGCCGAGTACCTGGCTCCTGTGGGGCGAGCGGGGGTTGCAGGGGA
1 GSIASGSWTEDTPLAPNVPY-
ACGCGGCGTCGAAGGCGGGTTCGGACCTGATGGCGCTCGCCTGGCACCGCACCCGGGGCC
29041 + + + + + + 29100
TGCGCCGCAGCTTCCGCCCAAGCCTGGACTACCGCGAGCGGACCGTGGCGTGGGCCCCGG
1 AASKAGSDLMALAWHRTRGL-
TGGACGTCGTCGTCACCCGGTGCACCAACAACTACGGTCCCTACCAGTACCCCGAGAAGG
29101 + + + + + + 29160
ACCTGCAGCAGCAGTGGGCCACGTGGTTGTTGATGCCAGGGATGGTCATGGGGCTCTTCC
1 DVVVTRCTNNYGPYQYPEKV-
TGATCCCGCTCTTCGTCACCAACATCCTCGACGGCTTGCGGGTGCCCCTGTACGGGGACG
29161 + + + + + + 29220
ACTAGGGCGAGAAGCAGTGGTTGTAGGAGCTGCCGAACGCCCACGGGGACATGCCCCTGC
1 IPLFVTNILDGLRVPLYGDG-
GCGCCCACCGCCGGGACTGGCTGCACGTGTCCGACCACTGCCGGGCCATCCAGATGGTCA
29221 + + + + + + 29280
CGCGGGTGGCGGCCCTGACCGACGTGCACAGGCTGGTGACGGCCCGGTAGGTCTACCAGT
1 AHRRDWLHVSDHCRAI QMVM-
TGAACTCCGGCCGGGCCGGGGAGGTCTACCACATCGGCGGCGGCACCGAACTCTCCAACG
29281 + + + + + + 29340
ACTTGAGGCCGGCCCGGCCCCTCCAGATGGTGTAGCCGCCGCCGTGGCTTGAGAGGTTGC
1 NSGRAGEVYHIGGGTELSNE-
AGGAACTCACCGGCCTGTTGCTCACGGCGTGCGGCACCGACTGGTCCTGCGTGGACCGGG
29341 + + + + + + 29400
TCCTTGAGTGGCCGGACAACGAGTGCCGCACGCCGTGGCTGACCAGGACGCACCTGGCCC
1 ELTGLLLTACGTDWS CVDRV-
TGGCCGACCGGCAGGGGCACGACCGCCGCTACTCGCTCGACATCACGAAGATCCGGCAGG
29401 + + + + + + 29460
ACCGGCTGGCCGTCCCCGTGCTGGCGGCGATGAGCGAGCTGTAGTGCTTCTAGGCCGTCC
1 ADRQGHDRRYSLDITKI RQE-
AACTGGGCTACGAGCCCCTGGTCGCCTTCGAGGACGGCCTGGCCGCGACGGTGAAGTGGT
29461 + + + + + + 29520
TTGACCCGATGCTCGGGGACCAGCGGAAGCTCCTGCCGGACCGGCGCTGCCACTTCACCA
1 LGYEPLVAFEDGLAATVKWY-
37
ACCACGAGAACCGTTCGTGGTGGCAGCCGCTGAAGGAAGCGGCCGGCCTCCTGGACGCCG
29521 + + + + + + 29580
TGGTGCTCTTGGCAAGCACCACCGTCGGCGACTTCCTTCGCCGGCCGGAGGACCTGCGGC
1 HENRSWWQPLKEAAGLLDAV-
TCGGCTGACGGCAGCCACCGCTAGGAACACCCCAGGAAAGGAGCCACCTCCGTGACAGCA
29581 + + + + + + 29640
AGCCGACTGCCGTCGGTGGCGATCCTTGTGGGGTCCTTTCCTCGGTGGAGGCACTGTCGT
2-> M T A
1-* G * -
GTCAAGGAGCCGACGTCCCGCGCAGGACGGCGGGAGTGGATCGCTCTCGTCGTCCTCTCC
29641 + + + + + + 29700
CAGTTCCTCGGCTGCAGGGCGCGTCCTGCCGCCCTCACCTAGCGAGAGCAGCAGGAGAGG
2 VKEPTSRAGRREWIALVVLS
TTGCCCACGATGCTGTTGATGCTGGACATCAACGTCCTCATGCTGGCCTTGCCGCAGTTG
29701 + + + + + + 29760
AACGGGTGCTACGACAACTACGACCTGTAGTTGCAGGAGTACGACCGGAACGGCGTCAAC
2 LPTMLLMLDINVLMLALPQL
AGCGAGGATCTCGGCGCGAGCAGCACGCAACAGCTGTGGATCACCGACATCTACGGATTC
29761 + + + + + + 29820
TCGCTCCTAGAGCCGCGCTCGTCGTGCGTTGTCGACACCTAGTGGCTGTAGATGCCTAAG
2 SEDLGASSTQQLWITDIYGF
GCGATCGCCGGCTTCCTGGTGACCATGGGCACCCTCGGCGACCGGATCGGCCGCCGCAGG
29821 + + + + + + 29880
CGCTAGCGGCCGAAGGACCACTGGTACCCGTGGGAGCCGCTGGCCTAGCCGGCGGCGTCC
2 AIAGFLVTMGTLGDRIGRRR
CTCCTGCTCGGGGGCGCGGCCGTCTTCGCGGTCGTGTCCGTCGTCGCCGCGTTCTCCGAC
29881 + + + + + + 29940
GAGGACGAGCCCCCGCGCCGGCAGAAGCGCCAGCACAGGCAGCAGCGGCGCAAGAGGCTG
2 LLLGGAAVFAVVSVVAAFSD
AGCGCGGCGATGCTCGTCGTCAGCCGCGCCGTGCTCGGCGTCGCCGGGGCCACGGTGATG
29941 + + + + + + 30000
TCGCGCCGCTACGAGCAGCAGTCGGCGCGGCACGAGCCGCAGCGGCCCCGGTGCCACTAC
2 SAAMLVVSRAVLGVAGATVM
CCCTCGACGCTCGCGCTCATCAGCAACATGTTCGAGGACCCCAAGGAGCGGGGCACCGCC
30001 + + + + + + 30060
GGGAGCTGCGAGCGCGAGTAGTCGTTGTACAAGCTCCTGGGGTTCCTCGCCCCGTGGCGG
2 PSTLALI SNMFEDPKERGTA
ATCGCCATGTGGGCGAGCGCCATGATGGCCGGAGTCGCCCTCGGGCCCGCCGTCGGCGGC
30061 + + + + + + 30120
TAGCGGTACACCCGCTCGCGGTACTACCGGCCTCAGCGGGAGCCCGGGCGGCAGCCGCCG
2 IAMWASAMMAGVALGPAVGG
CTGGTCCTCGCCGCGTTCTGGTGGGGATCGGTGTTCCTCATCGCCGTTCCGGTGATGCTG
30121 + + + + + + 30180
GACCAGGAGCGGCGCAAGACCACCCCTAGCCACAAGGAGTAGCGGCAAGGCCACTACGAC
2 LVLAAFWWGSVFL IAVPVML
CTGGTGGTGGTCACCGGCCCCGTGCTGCTCACCGAGTCCCGCGACCCGGACGCCGGACGG
30181 + + + + + + 30240
GACCACCACCAGTGGCCGGGGCACGACGAGTGGCTCAGGGCGCTGGGCCTGCGGCCTGCC
2 LVVVTGPVLLTESRDPDAGR
CTGGACCTGCTGAGCGCGGGGCTCTCCCTCGCGACCGTGCTGCCGGTGATCTACGGACTG
30241 + + + + + + 30300
GACCTGGACGACTCGCGCCCCGAGAGGGAGCGCTGGCACGACGGCCACTAGATGCCTGAC
2 LDLLSAGLSLATVLPVIYGL
38
AAGGAGCTGGCCCGGACCGGGTGGGACCCGCTCGCCGCCGGCGCGGTGGTCCTCGGCGTG
30301 + + + + + + 30360
TTCCTCGACCGGGCCTGGCCCACCCTGGGCGAGCGGCGGCCGCGCCACCAGGAGCCGCAC
KELARTGWDPLAAGAVVLGV
ATCTTCGGCGCGCTGTTCGTCCAGCGCCAGCGGCGGTTGGCCGACCCCATGCTGGACCTC
30361 + + + + + + 30420
TAGAAGCCGCGCGACAAGCAGGTCGCGGTCGCCGCCAACCGGCTGGGGTACGACCTGGAG
I FGALFVQRQRRLAD PMLDL
GGCCTCTTCGCCGACCGCACCCTGCGGGCGGGTCTGACGGTCAGTCTGGTCAACGCCGTC
30421 + + + + + + 30480
CCGGAGAAGCGGCTGGCGTGGGACGCCCGCCCAGACTGCCAGTCAGACCAGTTGCGGCAG
GLFADRTLRAGLTVSLVNAV
ATCATGGGCGGGACCGGACTGATGGTCGCCCTGTACCTCCAGACGATCGCCGGTCACTCC
30481 + + + + + + 30540
TAGTACCCGCCCTGGCCTGACTACCAGCGGGACATGGAGGTCTGCTAGCGGCCAGTGAGG
IMGGTGLMVALYLQTIAGHS
CCGTTGGCCGCCGGGCTGTGGCTGCTGATCCCGGCCTGCATGCTCGTCGTGGGCGTACAG
30541 + + + + + + 30600
GGCAACCGGCGGCCCGACACCGACGACTAGGGCCGGACGTACGAGCAGCACCCGCATGTC
PLAAGLWLLI PACMLVVGVQ
CTGTCGAACCTGCTGGCCCAGCGGATGCCCCCTTCCCGGGTGCTGCTGGGGGGACTGCTG
30601 + + + + + + 30660
GACAGCTTGGACGACCGGGTCGCCTACGGGGGAAGGGCCCACGACGACCCCCCTGACGAC
LSNLLAQRMPPSRVLLGGLL
ATCGCGGCCGTCGGACAGCTCCTGATCACCCAGGTGGACACCGAGGACACCGCCCTCCTC
30661 + + + + + + 30720
TAGCGCCGGCAGCCTGTCGAGGACTAGTGGGTCCACCTGTGGCTCCTGTGGCGGGAGGAG
IAAVGQLLITQVDTEDTALL
ATCGCGGCCACCACCCTGATCTACTTCGGCGCCTCACCGGTGGGGCCGATCACCACGGGC
30721 + + + + + + 30780
TAGCGCCGGTGGTGGGACTAGATGAAGCCGCGGAGTGGCCACCCCGGCTAGTGGTGCCCG
IAATTLIYFGASPVGPITTG
GCGATCATGGGAGCCGCGCCCCCGGAGAAGGCGGGTGCCGCCTCGTCGCTGTCCGCCACC
30781 + + + + + + 30840
CGCTAGTACCCTCGGCGCGGGGGCCTCTTCCGCCCACGGCGGAGCAGCGACAGGCGGTGG
AIMGAAPPEKAGAASSLSAT
GGCGGCGAGTTCGGAGTGGCGCTCGGCATCGCGGGCCTGGGGAGTCTGGGCACCGTCGTG
30841 + + + + + + 30900
CCGCCGCTCAAGCCTCACCGCGAGCCGTAGCGCCCGGACCCCTCAGACCCGTGGCAGCAC
GGEFGVALGIAGLGSLGTVV
TACAGCGCCGGGGTCGAGGTGCCGGACGCGGCCGGGCCCGCCGACGCCGACGCCGCGCAG
30901 + + + + + + 30960
ATGTCGCGGCCCCAGCTCCACGGCCTGCGCCGGCCCGGGCGGCTGCGGCTGCGGCGCGTC
YSAGVEVPDAAGPADADAAQ
GAGAGCATCGCCGGCGCCCTGCACACGGCCGGTCAGCTGGCACCGGGCAGCGCCGACGCC
30961 + + + + + + 31020
CTCTCGTAGCGGCCGCGGGACGTGTGCCGGCCAGTCGACCGTGGCCCGTCGCGGCTGCGG
ESIAGALHTAGQLAPGSADA
CTGCTGGACTCCGCGCGCGCGGCCTTCACCAGCGGCGTGCAGTCCGTCGCCGCCGTCTGC
31021 + + + + + + 31080
GACGACCTGAGGCGCGCGCGCCGGAAGTGGTCGCCGCACGTCAGGCAGCGGCGGCAGACG
LLDSARAAFTSGVQSVAAVC
39
GCCGTGTTCTCCCTGGCGCTCGCCGTCCTCATCGGCACCCGGCTGCGGGACATTTCCGCG
31081 + + + + + + 31140
CGGCACAAGAGGGACCGCGAGCGGCAGGAGTAGCCGTGGGCCGACGCCCTGTAAAGGCGC
2 AVFSLALAVLIGTRLRD I SA
ATGGACCACGGGCACGGCGAGGAACCGGCCGAGAACGACGCTCAACCGGCCACATGAGCG
31141 + + + + + + 31200
TACCTGGTGCCCGTGCCGCTCCTTGGCCGGCTCTTGCTGCGAGTTGGCCGGTGTACTCGC
2- * MDHGHGEEPAENDAQPAT*-
CACTTCCGGAGATGCAACGGCCGCCGTCGAGGTATGAGGATCACCTTCCGGGGTGCACCT
31201 + + + + + + 31260
GTGAAGGCCTCTACGTTGCCGGCGGCAGCTCCATACTCCTAGTGGAAGGCCCCACGTGGA
GCACGGCAACGGAGGCGTAGTGGAGTACTGGAACAGCACGGCGGAGACCATGC CC CGCCA
31261 + + + + + + 31320
CGTGCCGTTGCCTCCGCATCACCTCATGACCTTGTCGTGCCGCCTCTGGTACGGGGCGGT
3- > MEYWNSTAETMPRQ
GGAACTCGAACAGTGGAAGTGGCGCAGGCTCCAGGCCGCCATGGACCACGCCAGAAGGCT
31321 + + + + + + 31380
CCTTGAGCTTGTCACCTTCACCGCGTCCGAGGTCCGGCGGTACCTGGTGCGGTCTTCCGA
3 ELEQWKWRRLQAAMDHARRL
TTCGCCCTTCTGGCGGGAACGACTCCCCGAGAACATCACCTCCATGGCGGACTACGCGGC
31381 + + + + + + 31440
AAGCGGGAAGACCGCCCTTGCTGAGGGGCTCTTGTAGTGGAGGTACCGCCTGATGCGCCG
3 S PFWRERLPENI TSMADYAA-
GCGGGTGCCTCTCCTGCGCAAGGCCGACCTCCTCGCCGCGGAAGCCGCGTCTCCCCCTTA
31441 + + + + + + 31500
CGCCCACGGAGAGGACGCGTTCCGGCTGGAGGAGCGGCGCCTTCGGCGCAGAGGGGGAAT
3 RVPLLRKADLLAAEAASPPY-
CGGCACCTGGCCCTCGCTGGATCCGGCGCTCGGAGTGCGCCATCACCAGACCAGCGGCAC
31501 + + + + + + 31560
GCCGTGGACCGGGAGCGACCTAGGCCGCGAGCCTCACGCGGTAGTGGTCTGGTCGCCGTG
3 GTWPSLDPALGVRHHQTSGT-
CAGCGGTAACCCCCCCATCCGGACGTTCGACACCGAACGCGACTGGGCCTGGTGCGTGGA
31561 + + + + + + 31620
GTCGCCATTGGGGGGGTAGGCCTGCAAGCTGTGGCTTGCGCTGACCCGGACCACGCACCT
3 SGNPPIRTFDTERDWAWCVD-
CACGTTCTGCACGGCGCTCCACAGCATGGGCGTGCGCCCGCACCACAAGGGTCTGGTGGC
31621 + + + + + + 31680
GTGCAAGACGTGCCGCGAGGTGTCGTACCCGCACGCGGGCGTGGTGTTCCCAGACCACCG
3 TFCTALHSMGVRPHHKGLVA-
GTTCGGCTACGGGCTGTTCGCCGGTTTCTGGGGCATGCACTACGGCCTCGAGCGCATGGG
31681 + + + + + + 31740
CAAGC CGATGCC CGACAAGCGGCCAAAGACCCCGTACGTGATGCCGGAGCTCGCGTAC C C
3 FGYGLFAGFWGMHYGLERMG-
CGCCACGGTCATCCCGGCCGGCGGCCTCGACTCCCGCTCCCGGGTACGGCTGCTGGTCGA
31741 + + + + + + 31800
GCGGTGCCAGTAGGGCCGGCCGCCGGAGCTGAGGGCGAGGGCCCATGCCGACGACCAGCT
3 ATVI PAGGLDSRSRVRLLVD
CTACCAGATCGAGGTGCTCGGCCTCACACCGAGCTATGCGATGCGGCTGATCGAGACGGC
3180 l + + + + + + 31860
GATGGTCTAGCTCCACGAGCCGGAGTGTGGCTCGATACGCTACGCCGACTAGCTCTGCCG
3 YQIEVLGLTPSYAMRLIETA-
CCGCGAGATGGGCATCGACCTCGCCCGCGAGGCTAACGTCCAGATCATCCTGGCCGGGGC
31861 + + + + + + 31920
40
GGCGCTCTACCCGTAGCTGGAGCGGGCGCTCCGATTGCAGGTCTAGTAGGACCGGCCCCG
3 REMGIDLAREANVQI ILAGA-
GGAGCCGCGCTCCGCGTTCACCACCCGCACCATCGAGGAGGCCTTCGGCGCCCGGGTCTT
31921 + + + + + + 31980
CCTCGGCGCGAGGCGCAAGTGGTGGGCGTGGTAGCTCCTCCGGAAGCCGCGGGCCCAGAA
3 EPRSAFTTRTIEEAFGARVF
CAACGCCGCGGGCACCACTGAGTTCGGGGGGGTGTTCATGTTCGAGTGCACCGCCCGGCG
31981 + + + + + + 32040
GTTGCGGCGCCCGTGGTGACTCAAGCCCCCCCACAAGTACAAGCTCACGTGGCGGGCCGC
3 NAAGTTEFGGVFMFECTARR
CGAGGCCTGCCACATCATCGAACCCTCGTGCATCGAGGAGGTGCTCGACCCGGTGACGGA
32041 + + + + + + 32100
GCTCCGGACGGTGTAGTAGCTTGGGAGCACGTAGCTCCTCCACGAGCTGGGCCACTGCCT
3 EACHIIEPSCIEEVLDPVTE-
ACAGCCCGTCGGCTACGGCGAGGAGGGCGTCCGAGTCACCACCGGGCTGAACCGTGAGGG
32101 + + + + + + 32160
TGTCGGGCAGCCGATGCCGCTCCTCCCGCAGGCTCAGTGGTGGCCCGACTTGGCACTCCC
3 QPVGYGEEGVRVTTGLNREG-
GATGCAGCTCTTCCGGCACTGGACCGAGGACGTCGTGGTCAAGCGGCCCCACACCGAGTG
32161 + + + + + + 32220
CTACGTCGAGAAGGCCGTGACCTGGCTCCTGCAGCACCAGTTCGCCGGGGTGTGGCTCAC
3 MQL'FRHWTEDVVVKRPHTEC
CGGCTGCGGCCGGACGTGGGACTTCTACGACGGCGGCATCCTTCGGCGCGTGGACGACAT
32221 + + + + + + 32280
GCCGACGCCGGCCTGCACCCTGAAGATGCTGCCGCCGTAGGAAGCCGCGCACCTGCTGTA
3 GCGRTWDFYDGGILRRVDDM-
GCGCAAGATACGCGGGGTCTCGATCACCCCGGTGATGATCGAGGATGTGCTGCGCGGCTT
32281 + + + + + + 32340
CGCGTTCTATGCGCCCCAGAGCTAGTGGGGCCACTACTAGCTCCTACACGACGCGCCGAA
3 RKIRGVS ITPVMIEDVLRGF-
CGACGAGGTGAACGAGTTCCACTCGTCCATCCGGACCGTCCGCGGACTCGATACGATCCA
32341 + + + + + + 32400
GCTGCTCCACTTGCTCAAGGTGAGCAGGTAGGCCTGGCAGGCGCCTGAGCTATGCTAGGT
3 DEVNEFHSS IRTVRGLDTIH-
CGTCAAGGTCGAGGCGGGAGACATCTCGGGTGAGGCGGCCGAGAGCCTGTGCGGCCGCAT
32401 + + + + + + 32460
GCAGTTCCAGCTCCGCCCTCTGTAGAGCCCACTCCGGCGGCTCTCGGACACGCCGGCGTA
3 VKVEAGDISGEAAESLCGRI
CACCGAGGAGTTCAAGCGTGAGATAGGCATACGGCCCCAGGTGGAGCTGACCCCCGCGGG
32461 + + + + + + 32520
GTGGCTCCTCAAGTTCGCACTCTATCCGTATGCCGGGGTCCACCTCGACTGGGGGCGCCC
3 TEEFKRE IGIRPQVELTPAG-
CAGCCTCCCCCGATCGAAGTGGAAGGCGGCACGACTTCATGACGAGCGCGAACTCGCCCC
32521 + + + + + + 32580
GTCGGAGGGGGCTAGCTTCACCTTCCGCCGTGCTGAAGTACTGCTCGCGCTTGAGCGGGG
3 SLPRSKWKAARLHDERELAP
TCAGGCCTGAGCAGGTGGAGCAGCTCCTGGTGAGCTACCGGAGCCTGGGCCTGCTGGAGC
32581 + + + + + + 32640
AGTCCGGACTCGTCCACCTCGTCGAGGACCACTCGATGGCCTCGGACCCGGACGACCTCG
3-* Q A * -
AGAGCTGCGCGGTCCCGGCCGTGCTCGCCGCGGTCAGGGCCGCCCGTGCGGAACTCCGTA
32641 + + + + + + 32700
TCTCGACGCGCCAGGGCCGGCACGAGCGGCGCCAGTCCCGGCGGGCACGCCTTGAGGCAT
41
TCGCCCTGGACGGCCAGGGCGTGGAGTTCGAGTACTACCGGGGGCACGACGACAGCCTCG
32701 + + + + + + 32760
AGCGGGACCTGCCGGTCCCGCACCTCAAGCTCATGATGGCCCCCGTGCTGCTGTCGGAGC
TGGCCTGAACCCACCCCCGGTCCGCCGGGTCAGACGAAAGGGAGACCGGTGCCCCACGGT
32761 + + + + + + 32820
ACCGGACTTGGGTGGGGGCCAGGCGGCCCAGTCTGCTTTCCCTCTGGCCACGGGGTGCCA
> M P H G
GCAGAGCGCGAAGCGAGCCCGGCCGAGGAGAGCGCCGGCACCCGGCCGCTGACCGGCGAG
32821 + + + + + + 32880
CGTCTCGCGCTTCGCTCGGGCCGGCTCCTCTCGCGGCCGTGGGCCGGCGACTGGCCGCTC
AEREAS PAEESAGTRPLTGE
GAGTATCTGGAGAGCCTGCGGGACGCGCGGGAGGTGTACCTCGACGGCAGCCGCGTCAAG
32881 + + + + + + 32940
CTCATAGACCTCTCGGACGCCCTGCGCGCCCTCCACATGGAGCTGCCGTCGGCGCAGTTC
EYLESLRDAREVYLDGSRVK
GACGTCACCGCGCATCCCGCGTTCCACAACCCGGCCCGGATGACGGCCCGGCTGTACGAC
32941 + + + + + + 33000
CTGCAGTGGCGCGTAGGGCGCAAGGTGTTGGGCCGGGCCTACTGCCGGGCCGACATGCTG
DVTAHPAFHNPARMTARLYD
AGCCTGCACGACCCCGCCCAGAAAGCGGTCCTGACGGCGCCCACCGATGCCGGTGACGGT
33001 + + + + + + 33060
TCGGACGTGCTGGGGCGGGTCTTTCGCCAGGACTGCCGCGGGTGGCTACGGCCACTGCCA
SLHDPAQKAVLTAPTDAGDG
TTCACCCACCGCTTCTTCACCGCACCGCGCAGCGTCGACGACCTGGTCAAGGACCAGGCC
33061 + + + + + + 33120
AAGTGGGTGGCGAAGAAGTGGCGTGGCGCGTCGCAGCTGCTGGACCAGTTCCTGGTCCGG
FTHRFFTAPRSVDDLVKDQA
GCCATCGCATCCTGGGCGCGCAAGAGCTACGGCTGGATGGGGCGCAGCCCCGACTACAAG
33121 + + + + + + 33180
CGGTAGCGTAGGACCCGCGCGTTCTCGATGCCGACCTACCCCGCGTCGGGGCTGATGTTC
AIASWARKSYGWMGRS PDYK
GCGTCGTTCCTCGGCACGCTGGGGGCCAACGCCGACTTCTACGAGCCCTTCGCGGACAAC
33181 + + + + + + 33240
CGCAGCAAGGAGCCGTGCGACCCCCGGTTGCGGCTGAAGATGCTCGGGAAGCGCCTGTTG
AS FLGTLGANADFYE P FADN
GCCCGGCGCTGGTACCGGGAGTCGCAGGAGAAGGTGCTGTACTGGAACCATGCCTTCCTT
33241 + + + + + + 33300
CGGGCCGCGACCATGGCCCTCAGCGTCCTCTTCCACGACATGACCTTGGTACGGAAGGAA
ARRWYRESQEKVLYWNHAFL
CACCCGCCGGTCGACCGCTCGCTGCCCGCCGACGAGGTGGGCGACGTCTTCATCCACGTC
33301 + + + + + + 33360
GTGGGCGGCCAGCTGGCGAGCGACGGGCGGCTGCTCCACCCGCTGCAGAAGTAGGTGCAG
HPPVDRSLPADEVGDVFIHV
GAGCGGGAGACCGACGCGGGCCTGGTGGTGAGCGGGGCCAAGGTCGTCGCGACCGGATCG
33361 + + + + + + 33420
CTCGCCCTCTGGCTGCGCCCGGACCACCACTCGCCCCGGTTCCAGCAGCGCTGGCCTAGC
ERETDAGLVVSGAKVVATGS
GCCCTCACCCACGCGGCGTTCATCTCGCACTGGGGACTTCCCATCAAGGACCGGAAGTTC
33421 + + + + + + 33480
CGGGAGTGGGTGCGCCGCAAGTAGAGCGTGACCCCTGAAGGGTAGTTCCTGGCCTTCAAG
ALTHAAFI SHWGLP I KDRKF
GCCCTGGTGGCCACCGTGCCGATGGACGCGGACGGCCTCAAGGTGATCTGCCGTCCCTCC
42
33481 + + + + + + 33540
CGGGACCACCGGTGGCACGGCTACCTGCGCCTGCCGGAGTTCCACTAGACGGCAGGGAGG
ALVATVPMDADGLKVI CRPS
TACTCCGCAAACGCGGCGACCACGGGCAGCCCGTTCGACAACCCGCTGTCCTCACGGCTG
33541 + + + + + + 33600
ATGAGGCGTTTGCGCCGCTGGTGCCCGTCGGGCAAGCTGTTGGGCGACAGGAGTGCCGAC
YSANAATTGS PFDNPLSSRL
GACGAGAACGACGCCATCCTCGTACTCGACCAGGTGCTGATCCCCTGGGAGAACGTGTTC
33601 + + + + + + 33660
CTGCTCTTGCTGCGGTAGGAGCATGAGCTGGTCCACGACTAGGGGACCCTCTTGCACAAG
DENDAILVLDQVLI PWENVF
GTCTACGGCAACCTGGGCAAGGTACATCTCCTCGCCGGACAGTCCGGGATGATCGAACGC
33661 + + + + + + 33720
CAGATGCCGTTGGACCCGTTCCATGTAGAGGAGCGGCCTGTCAGGCCCTACTAGCTTGCG
VYGNLGKVHLLAGQSGMIER
GCCACCTTCCACGGGTGCACCCGGCTCGCCGTGAAGCTGGAGTTCATCGCCGGGCTGCTG
33721 + + + + + + 33780
CGGTGGAAGGTGCCCACGTGGGCCGAGCGGCACTTCGACCTCAAGTAGCGGCCCGACGAC
ATFHGCTRLAVKLEFIAGLL
GCCAAGGCGCTGGACATCACCGGGGCGAAGGACTTCCGCGGTGTGCAGACCCGGCTCGGA
33781 + + + + + + 33840
CGGTTCCGCGACCTGTAGTGGCCCCGCTTCCTGAAGGCGCCACACGTCTGGGCCGAGCCT
AKALDITGAKDFRGVQTRLG
GAAGTCCTGGCCTGGCGCAACCTCTTCTGGTCACTGTCGGACGCGGCGGCCCGCAACCCC
33841 + + + + + + 33900
CTTCAGGACCGGACCGCGTTGGAGAAGACCAGTGACAGCCTGCGCCGCCGGGCGTTGGGG
EVLAWRNLFWSLSDAAARNP
GTCCCCTGGAAGAACGGCACGCTCCTGCCCAACCCTCAGGCGGGTATGGCCTACCGCTGG
33901 + + + + + + 33960
CAGGGGACCTTCTTGCCGTGCGAGGACGGGTTGGGAGTCCGCCCATACCGGATGGCGACC
VPWKNGTLLPNPQAGMAYRW
TTCATGCAGATCGGCTACCCGCGGGTCCTGGAGATCGTCCAACAGGACGTGGCCAGCGGC
33961 + + + + + + 34020
AAGTACGTCTAGCCGATGGGCGCCCAGGACCTCTAGCAGGTTGTCCTGCACCGGTCGCCG
FMQIGYPRVLEIVQQDVASG
CTCATGTACGTCAACTCCTCCACGGAGGACTTCCGCAACCCCGAGACCGGCCCCTACTTG
34021 + + + + + + 34080
GAGTACATGCAGTTGAGGAGGTGCCTCCTGAAGGCGTTGGGGCTCTGGCCGGGGATGAAC
LMYVNSSTEDFRNPETGPYL
GAGAAGTACCTCCGGGGCAGCGACGGCGCAGGCGCCGTCGAGCGTGTCAAGGTGATGAAG
34081 + + + + + + 34140
CTCTTCATGGAGGCCCCGTCGCTGCCGCGTCCGCGGCAGCTCGCACAGTTCCACTACTTC
EKYLRGSDGAGAVERVKVMK
CTGCTGTGGGACGCGGTGGGATCCGACTTCGGCGGCCGGCACGAACTCTACGAGCGGAAC
34141 + + + + + + 34200
GACGACACCCTGCGCCACCCTAGGCTGAAGCCGCCGGCCGTGCTTGAGATGCTCGCCTTG
LLWDAVGSDFGGRHELYERN
TACTCCGGGAACCACGAGAACACCCGGATCGAGTTGCTGCTGTCGCAGACGGCGAGCGGC
34201 + + + + + + 34260
ATGAGGCCCTTGGTGCTCTTGTGGGCCTAGCTCAACGACGACAGCGTCTGCCGCTCGCCG
YSGNHENTRIELLLSQTASG
AAACTGGACTCGTACATGGACTTCGCCCAGGCATGCATGGACGAGTACGACCTGGACGGC
34261 + + + + + + 34320
43
TTTGACCTGAGCATGTACCTGAAGCGGGTCCGTACGTACCTGCTCATGCTGGACCTGCCG
4 KLDSYMDFAQACMDEYDLDG
TGGACCGCTCCCGACCTGGAGTCGTTTCACGCGATGCGTTCCGCCTCCCGCGACCTTCTC
34321 + + + + + + 34380
ACCTGGCGAGGGCTGGACCTCAGCAAAGTGCGCTACGCAAGGCGGAGGGCGCTGGAAGAG
4 WTAPDLESFHAMRSASRDLL
GGAGGGCTGTAGTTCCCCGACGGTGTACTGCGGCCCCCGATCCGGGGGCCGCAGTACACC
34381 + + + + + + 34440
CCTCCCGACATCAAGGGGCTGCCACATGACGCCGGGGGCTAGGCCCCCGGCGTCATGTGG
4-* G G L * -
GTCGGGGCGGCTGGTGCTCAGCCGCGCAGGAATCCGATGAGCTCGGGGGCGAGCTTCTTG
34441 + + + + + + 34500
CAGCCCCGCCGACCACGAGTCGGCGCGTCCTTAGGCTACTCGAGCCCCCGCTCGAAGAAC
22-* *GRLFGILEPALKK-
GGCGCCATGGCGACGGCACCGTGGTTGAGCCCGTTCAGGGTGCGGTGGCTCGCGTCGGGG
34501 + + + + + + 34560
CCGCGGTACCGCTGCCGTGGCACCAACTCGGGCAAGTCCCACGCCACCGAGCGCAGCCCC
22 PAMAVAGHNLGNLTRHSADP-
AGGACTCCGGTGAGTTCCTTCGCGGCACGCTGGAAACCGTCGGGGCTCTTGGAACCGGTC
34561 + + + + + + 34620
TCCTGAGGCCACTCAAGGAAGCGCCGTGCGACCTTTGGCAGCCCCGAGAACCTTGGCCAG
22 LVGTLEKAARQFGDPSKSGT-
AGCACCAGGGTCGGGGCCGACGCCGCCGACCACGGCTCGGCGGGGAGCGGCTTGCCCTGC
34621 + + + + + + 34680
TCGTGGTCCCAGCCCCGGCTGCGGCGGCTGGTGCCGAGCCGCCCCTCGCCGAACGGGACG
22 LVLTPASAASWPEAPLPKGQ
TGGGTGTCGCCCATCACCGCGATGTCGTAGGGAAGCGTGTTGGCCAGACCCTTGAGGTTG
34681 + + + + + + 34740
ACCCACAGCGGGTAGTGGCGCTACAGCATCCCTTCGCACAACCGGTCTGGGAACTCCAAC
22 QTDGMVAIDYPLTNALGKLN-
GACCAGACACCGGGCATCAGGCGCATGGCGCCGACCATGAAGGAGGGCATGCCCTGTGCC
34741 + + + + + -t- 34800
CTGGTCTGTGGCCCGTAGTCCGCGTACCGCGGCTGGTACTTCCTCCCGTACGGGACACGG
22 SWVGPMLRMAGVMFS PMGQA-
TTGACCATGAAGGCCTTGACCGCGTCGCTGCGTCGGTCCTCCGCCAGAAGGCTGTCGATC
34801 + + + + + + 34860
AACTGGTACTTCCGGAACTGGCGCAGCGACGCAGCCAGGAGGCGGTCTTCCGACAGCTAG
22 KVMFAKVADSRRDEALLSDI
TGACCGCCGAAGCCGGCGGGCGGGCCGAAGCCGTCCGAGGTGACGGAGAACGGCGGCTCG
34861 + + + + + + 34920
ACTGGCGGCTTCGGCCGCCCGCCCGGCTTCGGCAGGCTCCACTGCCTCTTGCCGCCGAGC
22 QGGFGAPPGFGDSTVS FPPE
TAGACCGCGAGCTTGTTCACCTTCAGGCCGGCGGCGGCGGCTCGCAGGGCGAGCACCGCG
34921 + + + + + + 34980
ATCTGGCGCTCGAACAAGTGGAAGTCCGGCCGCCGCCGCCGAGCGTCCCGCTCGTGGCGC
22 YVALKNVKLGAAAARLALVA
CCGGAAGAGCTGCCGAACAGGGAGGCCGAACCGCCGACCTGGTCGATCAGCGCCGCGATG
34981 + + + + + + 35040
GGCCTTCTCGACGGCTTGTCCCTCCGGCTTGGCGGCTGGACCAGCTAGTCGCGGCGCTAC
22 GSSSGFLSASGGVQDILAAI-
TCCTCGATCTCGCGCTCGACCGCGTACGCCGGACCGTCGGCGCTGGCGCCGCGGCCCCGA
35041 + + + + + + 35100
AGGAGCTAGAGCGCGAGCTGGCGCATGCGGCCTGGCAGCCGCGACCGCGGCGCCGGGGCT
44
22 DE I EREVAYAP, GDASAGRGR
CGGTCGTAGTTGACGACCGTGAAGTGCTCGGCGAGGAGACCGGCGAGCTTCTTGGCGTCG
35101 + + + + + + 35160
GCCAGCATCAACTGCTGGCACTTCACGAGCCGCTCCTCTGGCCGCTCGAAGAACCGCAGC
22 RDYNVVTFHEALLGALKKAD
GAGCGGTCGGCCAAGGCGGAGGCCACCAGGATCACCGCCGGCCCCTCGCCCGACTTGTCG
35161 + + + + + + 35220
CTCGCCAGCCGGTTCCGCCTCCGGTGGTCCTAGTGGCGGCCGGGGAGCGGGCTGAACAGC
22 SRDALASAVLIVAPGEGSKD-
AAGGCGATCGTGGTGCCGTCGGCCGATACCGTCGTTGATTCCACCTTGGCTGCTTTCTCA
35221 + + + + + + 35280
TTCCGCTAGCACCACGGCAGCCGGCTATGGCAGCAACTAAGGTGGAACCGACGAAAGAGT
22 F IATTGDASVTTSEVKAAKE
CGGGTTGAAGACATAGCTTCCCTCAGATCACATTGTGGGGCGTGCTGCCGACAGTGGAGA
35281 + + + + + + 35340
GCCCAACTTCTGTATCGAAGGGAGTCTAGTGTAACACCCCGCACGACGGCTGTCACCTCT
22-< R T S S M -
CCGGCGTCCGGAGGAAAAGTAATCGGTCCTGCCAGAATTGGGGGTTCCGGAGGGCACGCC
35341 + + + + + + 35400
GGCCGCAGGCCTCCTTTTCATTAGCCAGGACGGTCTTAACCCCCAAGGCCTCCCGTGCGG
GACCGCTGCACGACGGCGCGCCCCGACCTTCCGGACATTGTCGTGCCCTCAGATGTGTTT
35401 + + + + + + 35460
CTGGCGACGTGCTGCCGCGCGGGGCTGGAAGGCCTGTAACAGCACGGGAGTCTACACAAA
CGCATCTTCAGGAGTGCTCAGTGATCCGTGAGGTGAGAAAGGGACGGTGGTCCGGTCAGT
35461 + + + + + + 35520
GCGTAGAAGTCCTCACGAGTCACTAGGCACTCCACTCTTTCCCTGCCACCAGGCCAGTCA
18-* *
CGTTGCCGCGCGGGCTGTTCTGGTAAGCGGCCAGACGCCACTGCCCGTCCTGTTCGACGG
35521 + + + + + + 35580
GCAACGGCGCGCCCGACAAGACCATTCGCCGGTCTGCGGTGACGGGCAGGACAAGCTGCC
18 DNGRPSNQYAALRWQGDQEV
CCAGCCAGGAGGCCCGGACGGCGCCGTCGCCGCTCGCCTCGGTCTCCCCCGGGGCGAGGA
35581 + + + + + + 35640
GGTCGGTCCTCCGGGCCTGCCGCGGCAGCGGCGAGCGGAGCCAGAGGGGGCCCCGCTCCT
18 ALWSARVAGDGSAETEGPAL
TGCCGCCCTCGGTGATGAGCAGGGCGATGCCGTCGCCGAGCAGGCGCGCGTCGATGGGGC
35641 + + + + + + 35700
ACGGCGGGAGCCACTACTCGTCCCGCTACGGCAGCGGCTCGTCCGCGCGCAGCTACCCCG
18 IGGETILLAIGDGLLRADIP
TGCCGATGACACGGGTGCCCTTGTACGGGCCCGCGAAGGCGGCCGCCATGTGGGTGCGGA
35701 + + + + + + 35760
ACGGCTACTGTGCCCACGGGAACATGCCCGGGCGCTTCCGCCGGCGGTACACCCACGCCT
18 SGIVRTGKYPGAFAAAMHTR
TGTTCTCGCGGCCCTTGCGGAAGAGGCCGGGGAGGATCATCGTCCCGTCCTCGGCGAAGA
35761 + + + + + + 35820
ACAAGAGCGCCGGGAACGCCTTCTCCGGCCCCTCCTAGTAGCAGGGCAGGAGCCGCTTCT
18 INERGKRFLGPLIMTGDEAF
CGTCGGCGAACCGGTCGGCGTCGTGGTCGGCCCAGGCGGCCACGATGCGCGCCGGCAGAG
35821 + + + + + + 35880
GCAGCCGCTTGGCCAGCCGCAGCACCAGCCGGGTCCGCCGGTGCTACGCGCGGCCGTCTC
18 VDAFRDADHDAWAAVIRAPL
CGGCTACCGCTGCCAGGGCGGCGTCGGGAGCGGAGGTGGTCGAGTCGGTGCTGGTCATAT
45
35881 + + + + + + 35940
GCCGATGGCGACGGTCCCGCCGCAGCCCTCGCCTCCACCAGCTCAGCCACGACCAGTATA
-< AAVAALAADPASTTSDTSTM
CGCGGTTCCCGTCCGTTGGTTGGCGGTTTCGGCACGGCCCGCAGCCCTGCCCGAGCCCGA
3594I + + + + + + 36000
GCGCCAAGGGCAGGCAACCAACCGCCAAAGCCGTGCCGGGCGTCGGGACGGGCTCGGGCT
CGCTGGCAGGCGGCCCCGTCATCAGGCATCTCCTGCGTTGCGCCCCACGCCAGTCACTTC
36001 + + + + + + 36060
GCGACCGTCCGCCGGGGCAGTAGTCCGTAGAGGACGCAACGCGGGGTGCGGTCAGTGAAG
ACGGCCAGAACAAGTCGCGCATTCTGGAAGAAGCTGAGGCCCGCGACCCGGTGCGACGAT
36 061 + + + + + + 36120
TGCCGGTCTTGTTCAGCGCGTAAGACCTTCTTCGACTCCGGGCGCTGGGCCACGCTGCTA
CTGCGGTGTCACGGAGTTCGCACACGTTTACGCACGGAGGCTCGATGCCCGCTGTCAATG
36121 + + + + + + 36180
GACGCCACAGTGCCTCAAGCGTGTGCAAATGCGTGCCTCCGAGCTACGGGCGACAGTTAC
> MPAVNG-
GATCGGTGCAGTCAGGCCAGTCGCACCGACGCTCCGTCGTGGCGACGGTGGTGGGCAACT
36181 + + + + + + 36240
CTAGCCACGTCAGTCCGGTCAGCGTGGCTGCGAGGCAGCACCGCTGCCACCACCCGTTGA
SVQSGQ SHRRSVVATVVGNF-
TCGTGGAGTCGTTCGACTGGCTCGCCTACGGGCTCTTCGCTCCTCTCTTCGCGGCTCAGT
36241 + + + + + + 36300
AGCACCTCAGCAAGCTGACCGAGCGGATGCCCGAGAAGCGAGGAGAGAAGCGCCGAGTCA
VESFDWLAYGLFAPLFAAQF-
TCTTCCCCTCGTCCAACCAGTTCACCTCCCTGCTCGGCGCGTTCGCGGTCTTCGGCACGG
36301 + + + + + + 36360
AGAAGGGGAGCAGGTTGGTCAAGTGGAGGGACGAGCCGCGCAAGCGCCAGAAGCCGTGCC
FPSSNQFTSLLGAFAVFGTG-
GCATGCTCTTCCGGCCGATCGGCGGGGTCCTGCTGGGCCGCCTCGCCGACCGGCGCGGCC
36361 + + + + + + 36420
CGTACGAGAAGGCCGGCTAGCCGCCCCAGGACGACCCGGCGGAGCGGCTGGCCGCGCCGG
ML FR P I GGVLLGRLADRRGR-
GGCGCCCCGCCCTGATGCTGGCGATCGGACTGATGACCGGCGGCTCGACCCTGATCGCCG
36421 + + + + + + 36480
CCGCGGGGCGGGACTACGACCGCTAGCCTGACTACTGGCCGCCGAGCTGGGACTAGCGGC
RPALMLAI GLMTGGSTL I AV-
TCGTCCCCACCTACGAGCACATCGGGATCCTCGCCCCGCTGCTTCTGCTGCTCGCCCGGC
36481 + + + + + + 36540
AGCAGGGGTGGATGCTCGTGTAGCCCTAGGAGCGGGGCGACGAAGACGACGAGCGGGCCG
VPTYEHIGILAPLLLLLARL-
TCGCCCAGGGAGTCTCCTCGGGCGGGGAATGGACAGCGGCGGCCACCTACCTGATGGAGA
36541 + + + + + + 36600
AGCGGGTCCCTCAGAGGAGCCCGCCCCTTACCTGTCGCCGCCGGTGGATGGACTACCTCT
AQGVS SGGEWTAAATYLME I -
TCGCGCCGAAGAACCGCCGGTGCCTCTACAGCAGCCTCTTCTCCGTGACGACCATGGCGG
36601 + + + + + + 36660
AGCGCGGCTTCTTGGCGGCCACGGAGATGTCGTCGGAGAAGAGGCACTGCTGGTACCGCC
APKNRRCLYSSLFSVTTMAG-
GCCCCTTCGTCGCATCGCTGCTGGGCGCGGGCCTCGGCGTGTGGCTGGGAACCGCGACGA
36661 + + + + + + 36720
CGGGGAAGCAGCGTAGCGACGACCCGCGCCCGGAGCCGCACACCGACCCTTGGCGCTGCT
PFVASLLGAGLGVWLGTATM-
46
TGGAGGCCTGGGGCTGGCGGGTGCCGTTCCTCCTCGGCGGCGTCTTCGGCGTGATCCTGC
36721 + + + + + + 36780
ACCTCCGGACCCCGACCGCCCACGGCAAGGAGGAGCCGCCGCAGAAGCCGCACTAGGACG
5 EAWGWRVPFLLGGVFGVILL-
TGTTCCTGCGCCGTCGGCTCACCGAGACCGAGGTCTTCCGCCGGGAGGTGCGGCCCCGGG
36781 + + + + + + 36840
ACAAGGACGCGGCAGCCGAGTGGCTCTGGCTCCAGAAGGCGGCCCTCCACGCCGGGGCCC
5 FLRRRLTETEVFRREVRPRA-
CCCGGCGCGGCTCACTGGGCCAGCTGATCGGAGCCCACCGCCCCCAGGTGCTGCTGGCCG
36841 + + + + + + 36900
GGGCCGCGCCGAGTGACCCGGTCGACTAGCCTCGGGTGGCGGGGGTCCACGACGACCGGC
5 RRGSLGQL IGAHRPQVLLAV-
TGATGCTGGTGGCCGGACTGGGCGTCATCGGCGGAACGTGGTCGACCGCGGTCCCGGCGA
36901 + + + + + + 36960
ACTACGACCACCGGCCTGACCCGCAGTAGCCGCCTTGCACCAGCTGGCGCCAGGGCCGCT
5 MLVAGLGVI GGTW S TAVPAM-
TGGGCCACCGTCTGATCGGCTCGCAGACGATGTTCTGGGTGGTGGTCTGTGTGACCGGCT
36961 + + + + + + 37020
ACC CGGTGGCAGACTAGCCGAGCGTCTGCTACAAGACCCACCACCAGACACACTGGC CGA
5 GHRL I GSQTMFWVVVCVTGS-
CGGTCATCCTGCTGCAGGTACCCATAGGGCTGCTCGCCGACCGGGTGGAACCGGGCAGGT
37021 + + + + + + 37080
GCCAGTAGGACGACGTCCATGGGTATCCCGACGAGCGGCTGGCCCACCTTGGCCCGTCCA
5 VI LLQVP IGLLADRVE PGRF-
TCCTGATCGTCTCCAGCGTCGTCTTCGCCGCTGTGGGCTCGTACGCCTACCTCACCGTCC
37081 + + + + + + 37140
AGGACTAGCAGAGGTCGCAGCAGAAGCGGCGACACCCGAGCATGCGGATGGAGTGGCAGG
5 LIVSSVVFAAVGSYAYLTVQ-
AGGACTCCTTCGCGAGCCTGGCGTTCACGTACAGCACCGGAGTGATCTTCCTCGGCTGCG
37141 + + + + + + 37200
TCCTGAGGAAGCGCTCGGACCGCAAGTGCATGTCGTGGCCTCACTAGAAGGAGCCGACGC
5 DS FASLAFTYSTGVI FLGCV-
TCACCATGGTGCTGCCGAAGATGCTCTCCAGAATCTTCCCTCCGCAGATACGCGGCCTGG
37201 + + + + + + 37260
AGTGGTACCACGACGGCTTCTACGAGAGGTCTTAGAAGGGAGGCGTCTATGCGCCGGACC
5 TMVLPKMLSRI FPPQI RGLG-
GCATCGGGCTGCCGCACGCCTCGACCACCGCACTCCTCGGCGGGGCGGGGCCACTGCTGG
37261 + + + + + + 37320
CGTAGCCCGACGGCGTGCGGAGCTGGTGGCGTGAGGAGCCGCCCCGCCCCGGTGACGACC
5 I GLPHAS TTALLGGAG PLLA-
CCGCCTACTCCGACGAGCGAGGCGCCTCGGGCTGGTTCATCGCCGCCGTGATGGCCGCGG
37321 + + + + + + 37380
GGCGGATGAGGCTGCTCGCTCCGCGGAGCCCGACCAAGTAGCGGCGGCACTACCGGCGCC
5 AYS DERGASGWF I AAVMAAV-
TCCTGCTCGCCTGGCCGGCCACCCTGTGGGAGCGACGGCTGTTCCGCGCCCGGACGGCCC
37381 + -f + + + + 37440
AGGACGAGCGGACCGGCCGGTGGGACACCCTCGCTGCCGACAAGGCGCGGGCCTGCCGGG
5 LLAWPATLWERRLFRARTAP-
CGGGAAGCGAGCCGGTTCCCGAATCCGCCGTCGCCCGCCCCGTCGGGTGACCGTCCGCAC
37441 + + + + + + 37500
GCCCTTCGCTCGGCCAAGGGCTTAGGCGGCAGCGGGCGGGGCAGCCCACTGGCAGGCGTG
5-* GSEPVPESAVARPVG*-
47
TTCTGCATCCCGTCCGGCACCGAGCGCCGGCGACCTTCCCGACTGAGAGGTTGACATCAT
37501 + + + + + + 37560
AAGACGTAGGGCAGGCCGTGGCTCGCGGCCGCTGGAAGGGCTGACTCTCCAACTGTAGTA
23-> M -
GACGACGTCCGACACCACCGACCGGTCCCAGGACGGCGTGCCGCCGCTCTCCTTCCACCA
37561 + + + + + + 37620
CTGCTGCAGGCTGTGGTGGCTGGCCAGGGTCCTGCCGCACGGCGGCGAGAGGAAGGTGGT
23 TTSDTTDRSQDGVPPLSFHQ-
GGAGTTCCTGTGCATGTTCGACAGCGGGAACGACGGCGCCGACGTGGGGCCGTTCGGCCC
37621 + + + + + + 37680
CCTCAAGGACACGTACAAGCTGTCGCCCTTGCTGCCGCGGCTGCACCCCGGCAAGCCGGG
23 EFLCMFDSGNDGADVGPFGP-
CATGTACCACATCGTCGGAGCCTGGCGGCTGACCGGCGGGATCGACGAGGAGACCCTGCG
37681 + + + + + + 37740
GTACATGGTGTAGCAGCCTCGGACCGCCGACTGGCCGCCCTAGCTGCTCCTCTGGGACGC
23 MYHIVGAWRLTGGIDEETLR
CGAGGCGCTGGGTGACGTCGTCGTGCGCCACGAGGCCCTGCGCACATCGCTGGTCCGCGA
37741 + + + + + + 37800
GCTCCGCGACCCACTGCAGCAGCACGCGGTGCTCCGGGACGCGTGTAGCGACCAGGCGCT
23 EALGDVVVRHEALRTSLVRE-
AGGTGGCACGCACCGGCCGGAGATCCTGCCTGCGGGGCCCGCCGCGCTGGAGGTCCGTGA
37801 + + + + + + 37860
TCCACCGTGCGTGGCCGGCCTCTAGGACGGACGCCCCGGGCGGCGCGACCTCCAGGCACT
23 GGTHRPEI LPAGPAALEVRD
TCTCGGCGACGTCGACGAGTCGGAGCGGGTGCGGCGCGGTGAGGAACTGCTCAACGAGGT
37861 + + + + + + 37920
AGAGCCGCTGCAGCTGCTCAGCCTCGCCCACGCCGCGCCACTCCTTGACGAGTTGCTCCA
23 LGDVDESERVRRGEELLNEV-
GGAGTCGACCGGTCTGAGCGTGCGGGAGCTGCCCCTGCTGCGGGCCGTGCTCGGACGCTT
37921 + + + + + + 37980
CCTCAGCTGGCCAGACTCGCACGCCCTCGACGGGGACGACGCCCGGCACGAGCCTGCGAA
23 ESTGLSVRELPLLRAVLGRF
CGACCAGAAGGACGCGGTGCTGGTCCTCATCGCCCACCACACCGCCGCGGACGCCTGGGC
37981 + + + + + + 38040
GCTGGTCTTCCTGCGCCACGACCAGGAGTAGCGGGTGGTGTGGCGGCGCCTGCGGACCCG
23 DQKDAVLVL IAHHTAADAWA
CATGCACGTCATCGCCCGCGACCTGCTCAACCTGTACGCCGCCAGGCGCGGGAACCCGGT
38041 + + + + + + 38100
GTACGTGCAGTAGCGGGCGCTGGACGAGTTGGACATGCGGCGGTCCGCGCCCTTGGGCCA
23 MHVIARDLLNLYAARRGNPV-
TCCCCCGCTCCCCGAGCCGGCCCAGCATGCCGAGTTCGCCCGCTGGGAGCGCGAGGCGGC
38101 + + + + + + 38160
AGGGGGCGAGGGGCTCGGCCGGGTCGTACGGCTCAAGCGGGCGACCCTCGCGCTCCGCCG
23 PPLPEPAQHAEFARWEREAA-
CGAGGCACCGCGGGTCGCGGTCTCGAAGGAATTCTGGCGCAAGCGCCTCCAGGGCGCGCG
38161 + + + + + + 38220
GCTCCGTGGCGCCCAGCGCCAGAGCTTCCTTAAGACCGCGTTCGCGGAGGTCCCGCGCGC
23 EAPRVAVS KEFWRKRLQGAR
GATCATCGGGCTGGAGACGGACATACCGCGCTCGGCGGGGCTGCCCAAGGGCACCGCGTG
38221 + + + + + + 38280
CTAGTAGCCCGACCTCTGCCTGTATGGCGCGAGCCGCCCCGACGGGTTCCCGTGGCGCAC
23 I IGLETD I PRSAGLPKGTAW
GCAGCGCTTCGCCGTACGCGGGGAACTGGCCGACGCCGTGGTGGAGTTCTCACGGGCCGC
48
38281 + + + + + + 38340
CGTCGCGAAGCGGCATGCGCCCCTTGACCGGCTGCGGCACCACCTCAAGAGTGCCCGGCG
23 QRFAVRGELADAVVEFSRAA-
CAAGTGCTCCCCGTTCATGACCATGTTCGCCGCCTACCAGGTGCTGCTGCACCGCAGGAC
38341 + + + + + + 38400
GTTCACGAGGGGCAAGTACTGGTACAAGCGGCGGATGGTCCACGACGACGTGGCGTCCTG
23 KCS PFMTMFAAYQVLLHRRT
GGGCGAGCTGGACATCACCGTGCCGACCTTCTCCGGGGGGCGCAACAACTCGCGGTTCGA
38401 + + + + + + 38460
CCCGCTCGACCTGTAGTGGCACGGCTGGAAGAGGCCCCCCGCGTTGTTGAGCGCCAAGCT
23 GELDITVPTFSGGRNNSRFE-
GGACACCGTCGGTTCCTTCATCAACTTCCTGCCGCTGCGTACCGACCTCTCCGGATGCGC
38461 + + + + + + 38520
C CTGTGGCAGCCAAGGAAGTAGTTGAAGGACGGCGACGCATGGCTGGAGAGGC CTACGCG
23 DTVGSFINFLPLRTDLSGCA-
ATCCTTCCGCGAGGTCGTGCTGCGCACCCGCACCACCTGCGGAGAGGCGTTCACCCACGA
38521 + + + + + + 38580
TAGGAAGGCGCTCCAGCACGACGCGTGGGCGTGGTGGACGCCTCTCCGCAAGTGGGTGCT
23 S FREVVLRTRTTCGEAFTHE
GCTGCCCTTCTCCCGGCTGATCCCGGAGGTGCCGGAGCTGATGGCGTCGGCGGCCTCCGA
38581 + + + + + + 38640
CGACGGGAAGAGGGCCGACTAGGGCCTCCACGGCCTCGACTACCGCAGCCGCCGGAGGCT
23 LPFSRLI PEVPELMASAASD
CAACCACCAGATCTCCGTCTTCCAGGCCGTGCACGCGCCCGCGTCCGAGGGGCCCGAGCA
38641 + + + + + + 38700
GTTGGTGGTCTAGAGGCAGAAGGTCCGGCACGTGCGCGGGCGCAGGCTCCCCGGGCTCGT
23 NHQI SVFQAVHAPASEGPEQ-
GGCCGGGGACCTGACGTACTCGAAGATCTGGGAGCGGCAGCTGTCGCAGGCGGAGGGCTC
38701 + + + + + + 38760
CCGGCCCCTGGACTGCATGAGCTTCTAGACCCTCGCCGTCGACAGCGTCCGCCTCCCGAG
23 AGDLTYSKIWERQLSQAEGS-
CGACATCCCCGACGGGGTGCTGTGGTCGATCCACATCGACCCCTCGGGCTCCATGGCCGG
38761 + + + + + + 38820
GCTGTAGGGGCTGCCCCACGACACCAGCTAGGTGTAGCTGGGGAGCCCGAGGTACCGGCC
23 DIPDGVLWSIHIDPSGSMAG-
CAGCCTCGGGTACAACACCAACCGCTTCAAGGACGAGACGATGGCGGCCTTCCTGGCCGA
38821 + + + + + + 38880
GTCGGAGCCCATGTTGTGGTTGGCGAAGTTCCTGCTCTGCTACCGCCGGAAGGACCGGCT
23 SLGYNTNRFKDETMAAFLAD-
CTACCTCGACGTGCTCGAGAACGCGGTGGCCCGGCCGGACGCCCCCTTCACCTCCTGAGA
38881 + + + + + + 38940
GATGGAGCTGCACGAGCTCTTGCGCCACCGGGCCGGCCTGCGGGGGAAGTGGAGGACTCT
23-* YLDVLENAVARPDAPFTS *-
CAGTTCCGGCGGCGGCGAACCCGCCCGAAGAAAGGAAAGCCAGTGTCCACCGTTTCCGAC
38941 + + + + + + 39000
GTCAAGGCCGCCGCCGCTTGGGCGGGCTTCTTTCCTTTCGGTCACAGGTGGCAAAGGCTG
26-> M S T V S D
ACAGCGGCCGGCTCCTCCCTGGAGGAGAAGGTCACCCGGATCTGGACGGGTGTTCTCGGC
39001 + + + + + + 39060
TGTCGCCGGCCGAGGAGGGACCTCCTCTTCCAGTGGGCCTAGACCTGCCCACAAGAGCCG
26 TAAGS SLEEKVTRIWTGVLG
ACGTCCGGTGAGGAAGGCGCGACGTTCATCGAGCTCGGAGGGCAGTCGGTCTCGGCCGTG
39061 + + + + + + 39120
49
TGCAGGCCACTCCTTCCGCGCTGCAAGTAGCTCGAGCCTCCCGTCAGCCAGAGCCGGCAC
TSGEEGATFIELGGQSVSAV
CGCATCGCCACGCGTATCCAGGAGGAGCTCGACATCTGGGTCGACATCGGCGTCCTCTTC
39121 + + + + + + 39180
GCGTAGCGGTGCGCATAGGTCCTCCTCGAGCTGTAGACCCAGCTGTAGCCGCAGGAGAAG
RIATRIQEELDIWVDIGVLF
GACGACCCGGATCTGCCTACCTTCATCGCGGCGGTCGTCCGGACGGCCGACGCCGCGGGC
39X81 + + + + + + 39240
CTGCTGGGCCTAGACGGATGGAAGTAGCGCCGCCAGCAGGCCTGCCGGCTGCGGCGCCCG
DDPDLPTF IAAVVRTADAAG
GGCGAGGGCTCCGGAACGCAGTGAGACTCGCCGGGCGCCGTCTCCCCGCGGCGCCCGGTT
39241 + + + + + + 39300
CCGCTCCCGAGGCCTTGCGTCACTCTGAGCGGCCCGCGGCAGAGGGGCGCCGCGGGCCAA
* GEGSGTQ*-
TCACATGGCTGAGGCGGTTCACCCGGTACCGGGTGAACCGCCTCAGCCATGTGAAACCGG
39301 + + + + + + 39360
AGTGTACCGACTCCGCCAAGTGGGCCATGGCCCACTTGGCGGAGTCGGTACACTTTGGCC
GCCTGGTCAGCGCAGCTGGATGTCCGTCTCCCGGGCGATCGCCCGGAGGAACTCGCCGCG
39361 + + + + + + 39420
CGGACCAGTCGCGTCGACCTACAGGCAGAGGGCCCGCTAGCGGGCCTCCTTGAGCGGCGC
.* *RLQIDTERAIARLFEGR-
GGACAGCGCGTCGGCGACCAGCTCGATGTCGTCGGCCATGTACCGGTCGACGCCCAGCGT
39421 + + + + + + 39480
CCTGTCGCGCAGCCGCTGGTCGAGCTACAGCAGCCGGTACATGGCCAGCTGCGGGTCGCA
S LADAVLE I DDAMYRDVGLT-
CGGAACCAGCCGGCGCACCGCTTCGTACGTGGCCTTCGCCGCCGGGCTCAAGCCGTCGAA
39481 + + + + + + 39540
GCCTTGGTCGGCCGCGTGGCGAAGCATGCACCGGAAGCGGCGGCCCGAGTTCGGCAGCTT
PVLRRVAEYTAKAAP S LGD F -
CCGGCCGGAGATGTCGACCGCCTGGGCGGCGGCCAGGTACTCCACCGCGAGGATCTTGTT
3954! + + + + + + 39600
GGCCGGCCTCTACAGCTGGCGGACCCGCCGCCGGTCCATGAGGTGGCGCTCCTAGAACAA
RG S I DVAQAAAL Y EVAL I KN-
GTTGTTCGACAGGACCCGGCGGGCGTTGCGGGCCGAGATCAGGCCCATGCTCACCACGTC
39601 + + + + + + 39660
CAACAAGCTGTCCTGGGCCGCCCGCAACGCCCGGCTCTAGTCCGGGTACGAGTGGTGCAG
NNSLVRRANRAS ILGMSVVD-
CTGGTTGTCGCCGTTGGACGGGACGCTCTGGGTGCTGGCCGGGCCGATCGTCCGGTTCTC
39661 + + + + + + 39720
GACCAACAGCGGCAACCTGCCCTGCGAGACCCACGACCGGCCCGGCTAGCAGGCCAAGAG
QNDGNS PVSQTSAPGITRNE-
GGCCACCAGTGCGGTGGCCGGGTACTGGGCGCCGGCGAATCCGCTGTGCAGCCCCGGGTC
39721 + + + + + + 39780
CCGGTGGTCACGCCACCGGCCCATGACCCGCGGCCGCTTAGGCGACACGTCGGGGCCCAG
AVLATAPYQAGAFGSHLGPD-
CCCGGAGACGAGGAACTCCGGGAGGCCGTAGCTGAGGTGCCGGTTCAGGACCCGGTTGAT
39781 + + + + + + 39840
GGGCGTCTGCTCCTTGAGGCCCTCCGGCATCGACTCCACGGCCAAGTCCTGGGCCAACTA
G SVLFE PLGYS LHRNLVRN I -
CTGCCGCTCGGCCAGGACGCCGAGCTGGGTGAGCGCGATGGTCACGAAGTCCATCGCGAA
39841 + + + + + + 39900
GACGGCGAGCCGGTCCTGCGGCTCGACCCACTCGCGCTACCAGTGCTTCAGGTAGCGCTT
50
24 QREALVGLQTLAI TVFDMAF-
CGCGATCGGCTGACCGTGGAAGTTCGCCCCGTGGAAGATCTCCTTGCCCTCGAAGAAGAG
399OI + + + + + + 39960
GCGCTAGCCGACTGGCACCTTCAAGCGGGGCACCTTCTAGAGGAACGGGAGCTTCTTCTC
24 AI PQGHFNAGHFI EKGEFFL-
CGGGTTGTCGTTGGCCGAGTTGAGCTCGATGCGCAGCTTGTGCCGCGCGTGGTACAAGGT
39961 + + + + + + 40020
GCCCAACAGCAACCGGCTCAACTCGAGCTACGCGTCGAACACGGCGCGCACCATGTTCCA
24 PNDNASNLE I RLKHRAHYLT-
GTCGCGCACCGCCCCGACGACCTGGGGGATGGCCCGCAGCGAGTAGGCCTTCTGCAGGTA
40021 + + + + + + 40080
CAGCGCGTGGCGGGGCTGCTGGACCCCCTACCGGGCGTCGCTCATCCGGAAGACGTCCAT
24 DRVAGVVQP IARL SYAKQLY-
GATCTCCGAGCGCTGGACGTCCTTGCCGGCCTCCTTGTCCTTCTGGAGTTCTCGGCGCAG
40081 + + + + + + 40140
CTAGAGGCTCGCGACCTGCAGGAACGGCCGGAGGAACAGGAAGACCTCAAGAGCCGCGTC
24 IESRQVDKGAEKDKQLERRL-
GTCGGCGTGCTCGACCGTCAGTCCGCTGCCCCGCATCAGGGCCCGCATGTTGGCGGCGGT
40141 + + + + + + 40200
CAGCCGCACGAGCTGGCAGTCAGGCGACGGGGCGTAGTCCCGGGCGTACAACCGCCGCCA
24 DAHEVTLGSGRMLARMNAAT-
GTCGATCTGGCCCTCGTGCGGGCGGGCTATGTCGTGCCCCTCCGCGAGGAAGGGGCTGGT
40201 + + + + + + 40260
CAGCTAGACCGGGAGCACGCCCGCCCGATACAGCACGGGGAGGCGCTCCTTCCCCGACCA
24 DIQGEHPRAIDHGEALFPST-
CGATCCGCGTACCGCCTCGATGAGCAGAGCCGTCACGATCTCGGCCTGCTGGGCCTGCTC
40261 + + + + + + 40320
GCTAGGCGCATGGCGGAGCTACTCGTCTCGGCAGTGCTAGAGCCGGACGACCCGGACGAG
24 SGRVAE I LLATV I EAQQAQE-
CAGGGCCCGTCCGACGACCAGGGAGCCCAGACCGGTCATCCCGGACGTGCCGTTGATCAG
40321 + + + + + + 40380
GTCCCGGGCAGGCTGCTGGTCCCTCGGGTCTGGCCAGTAGGGCCTGCACGGCAACTAGTC
24 LARGVVLSGLGTMGSTGNIL-
TGCGAGGCCCTCCTTGAAGCGCAGTTCGAGCGGCTCGATGCCCCGCTCGGCCAGCACCTG
40381 + + + + + + 40440
ACGCTCCGGGAGGAACTTCGCGTCAAGCTCGCCGAGCTACGGGGCGAGCCGGTCGTGGAC
24 ALGEKFRLELPE I GREALVQ-
GGCGGTCTCCACCGGCCGTCCGTCGCGCAGGACGTAGCCCTCTCCGATGAGGGTGCTCGC
40441 + + + + + + 40500
CCGCCAGAGGTGGCCGGCAGGCAGCGCGTCCTGCATCGGGAGAGGCTACTCCCACGAGCG
24 ATEVPRGDRLVYGEGI LTSA-
GACGTGGGAGAGGGGAGCCAGGTCGCCGCTCGCCCCGAGTGACCCGATCTCGGGTATGGC
40501 + + + + + + 40560
CTGCACCCTCTCCCCTCGGTCCAGCGGCGAGCGGGGCTCACTGGGCTAGAGCCCATACCG
24 VHSL PALDGSAGLSG I EP I A -
CGGGGTGATGCCCTCGTTCAGGTACTGCGCGAGGCGTTCGAGGATGATGGGGCGCACCGC
40561 + + + + + + 40620
GCCCCACTACGGGAGCAAGTCCATGACGCGCTCCGCAAGCTCCTACTACCCCGCGTGGCG
24 PT I GENLYQALREL I I PRVA-
GGAGTGGCCCTTGGCGAGGGTGTTCAGCCGGGCGGCGACGATCGCCCGCGCCTCGTCCTC
4062 i + + + + + + 40680
CCTCACCGGGAACCGCTCCCACAAGTCGGCCCGCCGCTGCTAGCGGGCGCGGAGCAGGAG
51
24 S HGKALTNLRAAV IARAEDE-
GGCGAACAGCGGACCGACTCCCGCGCTGTGGCTACGGACGAGATTGGTCTGCAGTTCGAC
40681 + + + + + + 40740
CCGCTTGTCGCCTGGCTGAGGGCGCGACACCGATGCCTGCTCTAACCAGACGTCAAGCTG
24 AFLPGVGAS HSRVLNTQLEV-
TTCCTTCGACTTGTCGACCTGCATGTAGATCATCTCGCCGTACCCGGTGGTCACCCCGTA
40741 + + + + + + 40800
AAGGAAGCTGAACAGCTGGACGTACATCTAGTAGAGCGGCATGGGCCACCAGTGGGGCAT
24 E KS KDVQMY I MEGYGTTVGY-
GATGGGGATGTTCTGTTCGGCGATCCCTTCGAAGATCTCCCGGCTCTTCTGGGCCTTCGC
40801 + + + + + + 40860
CTACCC CTACAAGACAAGC CGCTAGGGAAGCTTCTAGAGGGCCGAGAAGAC CCGGAAGCG
24 I PINQEAIGEFIERSKQAKA-
GATGGATTCGGCCGGTAGGTCGACCGTCGCGCGTTCCTCCGCGACGCGGCGTACGGCTTC
40861 + + + + + + 40920
CTACCTAAGCCGGCCATGCAGCTGGCAGCGCGCAAGGAGGCGCTGCGCCGCATGCCGAAG
24 I S EAPVDVTARE EAVRRVAE-
GACGGTCAGGGTCTCGCCGTCGACGGAAACCGGGACGATCTCGGTCTCGACTTGAGTCAA
40921 + + + + + + 40980
CTGCCAGTCCCAGAGCGGCAGCTGCCTTTGGCCCTGCTAGAGCCAGAGCTGAACTCAGTT
24 VTLTEGDVSVPVI ETEVQTL-
TGCCATCACTCCATGGGTAGCGGCCGAGGCCGGTGTACGACAGGTCAGGGGGTGGGTTCG
40981 + + + + + + 41040
ACGGTAGTGAGGTACCCATCGCCGGCTCCGGCCACATGCTGTCCAGTCCCCCACCCAAGC
24- < A M -
TGAGGCGCGGCTCAGCGGGTGAGCCGGGAGCGGTCCACCTTCCCCGCGGCGTTGCGCGGC
41041 + + + + + + 41100
ACTCCGCGCCGAGTCGCCCACTCGGCCCTCGCCAGGTGGAAGGGGCGCCGCAACGCGCCG
25- * *RTLRSRDVKGAANRP -
AGGCGTGAAGTCAGGCGGGTGAAGACGGCGGGCAGTGCGAGGGGGCCGAACTGGCCGCGC
41101 + + + + + + 41160
TCCGCACTTCAGTCCGCCCACTTCTGCCGCCCGTCACGCTCCCCCGGCTTGACCGGCGCG
25 LRSTLRTFVAPLALPGFQGR-
AGATGGGAACGCCAGGCCCGGATGTCCGCGCGCACGTCCTCCCGGCCCTCTCCTTGTGGC
41161 + + + + + + 41220
TCTACCCTTGCGGTCCGGGCCTACAGGCGCGCGTGCAGGAGGGCCGGGAGAGGAACACCG
25 LHSRWARIDARVDERGEGQP
ACCACGTACACGGCGAGGCGGGTCACCAGGCCCTGGCCGTTGACGTGGGGGAGGACCGCG
41221 + + + + + + 41280
TGGTGCATGTGCCGCTCCGCCCAGTGGTCCGGGACCGGCAACTGCACCCCCTCCTGGCGC
25 VVYVALRTVLGQGNVHPLVA-
CACTCCAGGACCGAGGGGTCACGGTTCAGCGCGGCCTCGATCTCGGTGAGTTCCAAGCGG
41281 + + + + + + 41340
GTGAGGTCCTGGCTCCCCAGTGCCAAGTCGCGCCGGAGCTAGAGCCACTCAAGGTTCGCC
25 CELVS PDRNLAAE I ETLELR-
TTCCCGAACAGCTTGACCTGGAAGTCCTTGCGGCCCCGGAATTCCAGGGCTCCGTCGAAC
41341 + + + + + + 41400
AAGGGCTTGTCGAACTGGACCTTCAGGAACGCCGGGGCCTTAAGGTCCCGAGGCAGCTTG
25 NGFLKVQFDKRGRFELAGDF-
CGTACCCGCGCCAGATCCCCGGTCCGGTACCACCGGTCACCGTCCGGGGCGAGGCCGGCG
41401 + + + + + + 41460
GCATGGGCGCGGTCTAGGGGCCAGGCCATGGTGGCCAGTGGCAGGCCCCGCTCCGGCCGC
25 RVRALDGTRYWRDGD PALGA-
52
AGGGGCGCGAACAGCGCGCTGTGGTCCGGGCCGCCCTCGACGGCGAGATAACCCGGCGTC
41461 + + + + + + 41520
TCCCCGCGCTTGTCGCGCGACACCAGGCCCGGCGGGAGCTGCCGCTCTATTGGGCCGCAG
LPAFLASHDPGGEVALYGPT-
ACGTACGGGGAGCGGATCACCAGTTCGCCGGTGACGCCGGCGGGGCTCGGCCGGTCGTCC
41521 + + + + + + 41580
TGCATGCCCCTCGCCTAGTGGTCAAGCGGCCACTGCGGCCGCCCCGAGCCGGCCAGCAGG
VYPSRIVLEGTVGAPSPRDD
GCGTCCACGACGAGTACCTGGCGGCCGGGGAGCGGGTACCCGATCGGGGCCGGGCCCGTG
41581 + + + + + + 41640
CGCAGGTGCTGCTCATGGACCGCCGGCCCCTCGCCCATGGGCTAGCCCCGGCCCGGGCAC
ADVVLVQRGPLPYGI PAPGT
ACCGGCCCGGTGATCTCGTGCCAGGTCGCGGCGATCGTCTCGGTGGGCCCGTAGAGGTTG
41641 + + + + + + 41700
TGGCCGGGCCACTAGAGCACGGTCCAGCGCCGCTAGCAGAGCCACCCGGGCATCTCCAAC
VPGTIEHWTAAITETPGYLN-
ATCAGGCGGGTCCGGGGCAGGGCCGCGCGCAGTCCGTCCACGAGTTCGCCGGGCAGCGCC
41701 + + + + + + 41760
TAGTCCGCCCAGGCCCCGTCCCGGCGCGCGTCAGGCAGGTGCTCAAGCGGCCCGTCGCGG
ILRTRPLAARLGDVLEGPLA-
TCGCCCATCAGGAGCAGGTGGCCCAGGGTGCCGGGCCGATCGCCCGGGTCGGAGGCGGTG
41761 + + + + + + 41820
AGCGGGTAGTCCTCGTCCACCGGGTCCCACGGCCCGGCTAGCGGGCCCAGCCTCCGCCAC
EGMLLLHGLTGPRDGPDSAT-
ATCACTCCCAGGAGGTCCCGGGCGAAGCTGGGCACGGTCTGGAGATGAGTGATCCGCTCC
41821 + + + + + + 41880
TAGTGAGGGTCCTCCAGGGCCCGCTTCGACCCGTGCCAGACCTCTACTCACTAGGCGAGG
IVGLLDRAFSPVTQLHTIRE-
TGGACGAGCCACGGCACCAGCTTGTCGGGGTTCACCCTGACGCGCTCCGGCACCGGACAC
41881 + + + + + + 41940
ACCTGCTCGGTGCCGTGGTCGAACAGCCCCAAGTGGGACTGCGCGAGGCCGTGGCCTGTG
QVLWPVLKDPNVRVREPVPC-
AGCGTCCCGCCGGCCACGAGCGTCGCGAAGACCTCGGCCAGCGCCGGGTCGTGCTCCGGG
41 941 + + + + + + 42000
TCGCAGGGCGGCCGGTGCTCGCAGCGCTTCTGGAGCCGGTCGCGGCCCAGCACGAGGCCC
LTGGAVLTAFVEALAPDHEP -
SEQ ID No. 2. C-1027 gene cluster DNA sequence from 41,980 to 63,164
AGCGCCGGGTCGTGCTCCGGGGAGACCCACTGCGCCACCCGCGCGCCCGGCCCCATCGCG
41980 + + + + + + 42039
TCGCGGCCCAGCACGAGGCCCCTCTGGGTGACGCGGTGGGCGCGCGGGCCGGGGTAGCGC
LAPDHEPSVWQAVRAGPGMA-
AACCGTTCGCCCATCCAGCCCGCGAACTGGCCCAGCGCGGCATGCGACTGGGCGATCCCC
42040 + + + + + + 42099
TTGGCAAGCGGGTAGGTCGGGCGCTTGACCGGGTCGCGCCGTACGCTGACCCGCTAGGGG
FREGMWGAFQGLAAHSQAIG-
TTGGGCCGCCCGGTCGAACCCGAGGTGAACGCCACGTAGGCCAGGTCTGCCAGGCCCGGC
42100 + + + + + + 42159
AACCCGGCGGGCCAGCTTGGGCTCCACTTGCGGTGCATCCGGTCCAGACGGTCCGGGCCG
53
25 KPRGTSGSTFAVYALDALGP-
CCCGCCGCGGTCGTCGCGTCCGGGCCGGCGGCGGGTCGAGGGCCGAGCACAGAGGAGGCG
42160 + + + + + + 42219
GGGCGGCGCCAGCAGCGCAGGCCCGGCCGCCGCCCAGCTCCCGGCTCGTGTCTCCTCCGC
25 GAATTADPGAAPRPGLVSSA-
TCCAGCAGGGTGGCGCCCGGTTCACCGGCGTACCAGAGCGCCAGCGGATCCTCCTGCGGA
42220 + + + + + + 42279
AGGTCGTCCCACCGCGGGCCAAGTGGCCGCATGGTCTCGCGGTCGCCTAGGAGGACGCCT
25 DLLTAGPEGAYWLALPDEQP
TCGCCGTCGAGGACCAGGCACGCCGGGCGCAGATCGCTGAGCATCGACCGGTGTCGTTCG
42280 + + + + + + 42339
AGCGGCAGCTCCTGGTCCGTGCGGCCCGCGTCTAGCGACTCGTAGCTGGCCACAGCAAGC
25 DGDLVLCAPRLDSLMSRHRE-
CCCGCGCCGTCCGGAGCGAACCACGCCAGGTGGGCGCCCGCCTCCAGGACTCCCAGCAGC
42340 + + + + + + 42399
GGGCGCGGCAGGCCTCGCTTGGTGCGGTCCACCCGCGGGCGGAGGTCCTGAGGGTCGTCG
25 GAGDPAFWALHAGAELVGLL-
ACCGCGATCCGGCGGGCGCCCGGCTGCATCCGCACCGCCACCGGCGAGCCGTGCCCCGCG
42400 + + + + + + 42459
TGGCGCTAGGCCGCCCGCGGGCCGACGTAGGCGTGGCGGTGGCCGCTCGGCACGGGGCGC
25 VAIRRAGPQMRVAVPSGHGA-
CCGGCCGCGGTGAGGGCCGAGGCGACGCGGGCCGCGTCCGCGGTCAGTTCGGCGGTCAGT
42460 + + + + + + 42519
GGCCGGCGCCACTCCCGGCTCCGCTGCGCCCGGCGCAGGCGCCAGTCAAGGCGCCAGTCA
25 GAATLASAVRAADATLEATL
TCGGCGTAGCTTGTGCGCGTGCCGCCGAACGAGACGGCGACACCGTCGTGTTCCGCGTGG
42520 + + + + + + 42579
AGCCGCATCGAACACGCGCACGGCGGCTTGCTCTGCCGCTGTGGCAGCACAAGGCGCACC
25 EAYSTRTGGFSVAVGDHEAH
CGGCGGACCGAGGCGTGCACCGGCCGCGTCATGTCCCCGCCGGACGCCCGGCGGTCCGAA
42580 + + + + + + 42639
GCCGCCTGGCTCCGCACGTGGCCGGCGCAGTACAGGGGCGGCCTGCGGGCCGCCAGGCTT
25 RRVSAHVPRTM (ORF25)
GCGCGCAGGGCGTGGTCCCGGTGGCGGTCGTCGTCCAGCGGCAGAGCGCCCACGGGTGTG
42640 + + + + + + 42699
CGCGCGTCCCGCACCAGGGCCACCGCCAGCAGCAGGTCGCCGTCTCGCGGGTGCCCACAC
TCCGGATCCGTGGTCGCGGCGGTCAGGAGGACGGCCAGCTGATCCAGCATCCGCCGGGCC
42700 + + + + + + 42759
AGGCCTAGGCACCAGCGCCGCCAGTCCTCCTGCCGGTCGACTAGGTCGTAGGCGGCCCGG
GAAGCGGGCTCGAACAGAGCTTCGCGGTACTCCAGGTAGCCGGTGACCGAGGGCGCGGTG
42760 + + + + + + 42819
CTTCGCCCGAGCTTGTCTCGAAGCGCCATGAGGTCCATCGGCCACTGGCTCCCGCGCCAC
TCCTGCAGCACCAGGGTCAGGTCGGCGGCGGCAGTGCCGTTGTGCACGGACAGCCGCCTC
42820 + + + + + + 42879
AGGACGTCGTGGTCCCAGTCCAGCCGCCGCCGTCACGGCAACACGTGCCTGTCGGCGGAG
ACCTCGGCGCCTGGTATCCGCAGGCCCGGCCGCTCCTCGTGGACGAACACGGCGTCGGCC
42880 + + + + + + 42939
TGGAGCCGCGGACCATAGGCGTCCGGGCCGGCGAGGAGCACCTGCTTGTGCCGCAGCCGG
CCCTCGATCCGGCACGGCCCGGGGGCCGGGGCCGGCGTCGTGTGCAGCAGCTCCCGGAAG
42940 + + + + + + 42999
GGGAGCTAGGCCGTGCCGGGCCCCCGGCCCCGGCCGCAGCACACGTCGTCGAGGGCCTTC
54
GCGGTGGCCGGCGTGCCGTCGTCCTGTCCGGCGTAGCGCTGGACCAGGGCTCGGAATCCG
43000 + + + + + + 43059
CGCCACCGGCCGCACGGCAGCAGGACAGGCCGCATCGCGACCTGGTCCCGAGCCTTAGGC
GCCAGCACCACGGCCGCGGCGGTGACCCCTTCCGCTTCGGCGAGCCGGGCCGTACGGAAG
43060 + + + + + + 43119
CGGTCGTGGTGCCGGCGCCGCCACTGGGGAAGGCGAAGCCGCTCGGCCCGGCATGCCTTC
CCGAGGTCCGGACTCCAGCCGAAGGCGACGGTGCTCCCCGCGTGCGAGGGCAGGTGCGGG
43120 + + + + + + 43179
GGCTCCAGGCCTGAGGTCGGCTTCCGCTGCCACGAGGGGCGCACGCTCCCGTCCACGCCC
CGGTTCCGGTCGGCGGGCAGGACCTGTCCGGAGGCGGTCGCCGAAGACTCCTCGCTCCCG
43180 + + + + + + 43239
GCCAAGGCCAGCCGCCCGTCCTGGACAGGCCTCCGCCAGCGGCTTCTGAGGAGCGAGGGC
GGCGCCCGGGGCGTTTGCGGCGCGGGCGCAGTGGGAGGCCGGCCGCCGGTGGTGACGGCG
43240 + + + + + + 43299
CCGCGGGCCCCGCAAACGCCGCGCCCGCGTCACCCTCCGGCCGGCGGCCACCACTGCCGC
AGGTACGCGTTCGACAACGCGGCCGGCAGGGGCCCGGACGGCCCGTCCCAGGCTCCGGAG
43300 + + + + + + 43359
TCCATGCGCAAGCTGTTGCGCCGGCCGTCCCCGGGCCTGCCGGGCAGGGTCCGAGGCCTC
TGCGAGGCCACCAGGAGAAGCAGGTGCGCGCGTGGGCCTCTGCGGGCGATGTGGAGCCGT
43360 + + + + + + 43419
ACGCTCCGGTGGTCCTCTTCGTCCACGCGCGCACCCGGAGACGCCCGCTACACCTCGGCA
GCGGGCGCGTCACCCTCGGCGAAGGGACGGGCCGCCCAGCGAGCGCAGAGTTCCTCCTCC
43420 + + + + + + 43479
CGCCCGCGCAGTGGGAGCCGCTTCCCTGCCCGGCGGGTCGCTCGCGTCTCAAGGAGGAGG
CCGCACTCCTCGTCGGCACTCGGCCCGTCCACGGCGGCCCCGTCTCCGGCGGCGGCCCGC
43480 + + + + + + 43539
GGCGTGAGGAGCAGCCGTGAGCCGGGCAGGTGCCGCCGGGGCAGAGGCCGCCGCCGGGCG
CAGGCCGTCCGCAGGGCCTCCAGGTCGAGTCCGCCGCTCACGTGGTAGGCCGCGTACGGG
43540 + + + + + + 43599
GTCCGGCAGGCGTCCCGGAGGTCCAGCTCAGGCGGCGAGTGCACCATCCGGCGCATGCCC
TGCAACACCGCAGATCCGGAGGCCGGCGAAGGCCCCCGGTCCGGCTCGGTCACAGTCACG
43600 + + + + + + 43659
ACGTTGTGGCGTCTAGGCCTCCGGCCGCTTCCGGGGGCCAGGCCGAGCCAGTGTCAGTGC
TCATTCGCCACGACGCCCATCTTGGGGCGGCGGCGCACAGGACGCTTCTCCTTGAGTGCG
43660 + + + + + + 43719
AGTAAGCGGTGCTGCGGGTAGAACCCCGCCGCCGCGTGTCCTGCGAAGAGGAACTCACGC
GAGCTCCGCGTACGGCGCCGAAGCGTTCGGTCAAACCTTGTTCGACCAACTGCGCAATCT
43720 + + + + + + 43779
CTCGAGGCGCATGCCGCGGCTTCGCAAGCCAGTTTGGAACAAGCTGGTTGACGCGTTAGA
GGAAGTTGACGTCTTCCAGGTGGAGTTGGGAACGATGGAGGCCCCCGCCGGCCGCGTCGG
43780 + + + + + + 43839
CCTTCAACTGCAGAAGGTCCACCTCAACCCTTGCTACCTCCGGGGGCGGCCGGCGCAGCC
AACGGCCGTGCAGTGCGGCCCTCTCCAACACTCCCGGCCATCGCGGAATCCGAGACGTGC
43840 + + + + + + 43899
TTGCCGGCACGTCACGCCGGGAGAGGTTGTGAGGGCCGGTAGCGCCTTAGGCTCTGCACG
CCGAAGGAGCCCCCCTTGCAAGCCTGGTTCAAGCGCACCAGTGGTGTGCCCGGTGACAGA
43900 + + + + + + 43959
GGCTTCCTCGGGGGGAACGTTCGGACCAAGTTCGCGTGGTCACCACACGGGCCACTGTCT
27 (ORF27) V P G D R
55
CGTGGAAAGTGGCTGGTCCTGGCCGCCTGGCTCATCATCGCGATGGCGCTGGGCCCGCTG
43960 + + + + + + 44019
GCACCTTTCACCGACCAGGACCGGCGGACCGAGTAGTAGCGCTACCGCGACCCGGGCGAC
27 RGKWLVLAAWLI IAMALGPL
GCGGGGAAGCTCGCCGACGTCCAGGACTCCAGCGCCAACGCCTTCCTTCCGCGCAGCTCG
44020 + + + + + + 44079
CGCCCCTTCGAGCGGCTGCAGGTCCTGAGGTCGCGGTTGCGGAAGGAAGGCGCGTCGAGC
27 AGKLADVQDS SANAFLPRS S
GAGTCCGCGAAGCTGAACAAGGAACTGGAGAAGTTCCGCGCCGACGAGCTGATGCCGGCC
44080 + + + + + + 44139
CTCAGGCGCTTCGACTTGTTCCTTGACCTCTTCAAGGCGCGGCTGCTCGACTACGGCCGG
27 ESAKLNKELEKFRADELMPA
GTGGTGGTCTACAGCGCCGACGGCTCGCTGCCCGCCGAGGGGCGGGCCAAGGCCGAGAAG
44140 + + + + + + 44199
CACCACCAGATGTCGCGGCTGCCGAGCGACGGGCGGCTCCCCGCCCGGTTCCGGCTCTTC
27 VVVYSADGSLPAEGRAKAEK
GACATAGC CGCCTTCCAGGAGCTGGCCGC CGAGGGCGAGAAGGTCGAAGCGCCCCTGGAG
44200 + + + + + + 44259
CTGTATCGGCGGAAGGTCCTCGACCGGCGGCTCCCGCTCTTCCAGCTTCGCGGGGACCTC
27 DIAAFQELAAEGEKVEAPLE
TCGGAGGACGGCCAGGCGCTCATGGTCGTCGTTCCGCTGATCAGCGACGCCGACATCGTC
44260 + + + + + + 44319
AGCCTCCTGCCGGTCCGCGAGTACCAGCAGCAAGGCGACTAGTCGCTGCGGCTGTAGCAG
27 SEDGQALMVVVPLI SDADIV
GCCACGACGAAGAAGGTCCGCGATGTCGCGGACGCCAACGCCCCCCCGGGCGTCGCCATC
44320 + + + + + + 44379
CGGTGCTGCTTCTTCCAGGCGCTACAGCGCCTGCGGTTGCGGGGGGGCCCGCAGCGGTAG
27 ATTKKVRDVADANAPPGVAI
GAGGTGGGCGGGCCCGCCGGGTCGACGACCGACGCCGCCGGCGCTTTCGAGTCCCTCGAC
44380 + + + + + + 44439
CTCCACCCGCCCGGGCGGCCCAGCTGCTGGCTGCGGCGGCCGCGAAAGCTCAGGGAGCTG
27 EVGGPAGSTTDAAGAFESLD
TCCATGCTGATGATGGTCACCGGCCTTGTGGTCGCCATCCTGCTGCTGATCACCTACCGC
44440 + + + + + + 44499
AGGTACGACTACTACCAGTGGCCGGAACACCAGCGGTAGGACGACGACTAGTGGATGGCG
27 SMLMMVTGLVVAILLLITYR
TCCCCCATCCTGTGGCTGCTGCCCCTGCTCTCCGTCGGCTTCGCCTCCGTGCTGACCCAG
44500 + + + + + + 44559
AGGGGGTAGGACACCGACGACGGGGACGAGAGGCAGCCGAAGCGGAGGCACGACTGGGTC
27 SPILWLLPLLSVGFASVLTQ
GTCGGCACCTACATGCTCGCCAAGTACGCCGGGCTGCCGGTCGACCCGCAGAGCTCCGGC
44560 + + + + + + 44619
CAGCCGTGGATGTACGAGCGGTTCATGCGGCCCGACGGCCAGCTGGGCGTCTCGAGGCCG
27 VGTYMLAKYAGLPVDPQSSG
GTCCTGATGGTCCTCGTGTTCGGTGTCGGCACCGACTACGCCCTGCTGCTCATCGCCCGC
44620 + + + + + + 44679
CAGGACTACCAGGAGCACAAGCCACAGCCGTGGCTGATGCGGGACGACGAGTAGCGGGCG
27 VLMVLVFGVGTDYALLLIAR
TACCGTGAGGAACTGCGCCGCGAGCAGGACCGGCACGTGGCCATGAAGACCGCGTTGCGA
44680 + + + + + + 44739
ATGGCACTCCTTGACGCGGCGCTCGTCCTGGCCGTGCACCGGTACTTCTGGCGCAACGCT
27 YREELRREQDRHVAMKTALR
CGGTCGGGCCCGGCCATCCTGGCCTCGGCCGGCACCATCGCCATCGGCCTCGTCTGCCTG
56
44740 + + + + + + 44799
GCCAGCCCGGGCCGGTAGGACCGGAGCCGGCCGTGGTAGCGGTAGCCGGAGCAGACGGAC
27 RSGPAILASAGTIAIGLVCL
GTCCTCGCGGACGTCAACTCCTCCCGCTCCATGGGCCTGGTCGGCGCGATCGGCGTGGTC
44800 + + + + + + 44859
CAGGAGCGCCTGCAGTTGAGGAGGGCGAGGTACCCGGACCAGCCGCGCTAGCCGCAGCAG
27 VLADVNS SRSMGLVGAI GVV
TGCGCCCTCCTCGCCATGGTCACGATCCTGCCCGCGCTGCTGGTCATCCTGGGCCGCTGG
44860 + + + + + + 44919
ACGCGGGAGGAGCGGTACCAGTGCTAGGACGGGCGCGACGACCAGTAGGACCCGGCGACC
27 CALLAMVTILPALLVILGRW
GTGTTCTGGCCCTTCGTTCCCCGCTGGACGCCGGAGTCGGCCGCGGCCCCCGAGGCACCG
44920 + + + + + + 44979
CACAAGACCGGGAAGCAAGGGGCGACCTGCGGCCTCAGCCGGCGCCGGGGGCTCCGTGGC
27 VFWPFVPRWTPESAAAPEAP
GCGTCCCACAGCCGCTGGGAGCGCATCGGCTCCGTCACGGCCGCCCGGCCGCGCCGCGCC
44980 + + + + + + 45039
CGCAGGGTGTCGGCGACCCTCGCGTAGCCGAGGCAGTGCCGGCGGGCCGGCGCGGCGCGG
27 ASHSRWERIGSVTAARPRRA
TGGGTGCTGTCCTTGGCCGCGACGGGGCTTCTCGCCCTCAGTTCCCTCGGCCTCGACATG
45040 + + + + + + 45099
ACCCACGACAGGAACCGGCGCTGCCCCGAAGAGCGGGAGTCAAGGGAGCCGGAGCTGTAC
27 WVLSLAATGLLALS S LGLDM
GGACTCACCCAGAGCGAACTGCTCCAGACGAAGCCCGAGTCCGTCGTCGCCCAGGAGCGG
45100 + + + + + + 45159
CCTGAGTGGGTCTCGCTTGACGAGGTCTGCTTCGGGCTCAGGCAGCAGCGGGTCCTCGCC
27 GLTQSELLQTKPESVVAQER
ATCTCCGCCCACTACCCGTCCGGCTCCTCCGACCCCGCCACCGTCGTCGCACCCAGCGCG
45160 + + + + + + 45219
TAGAGGCGGGTGATGGGCAGGCCGAGGAGGCTGGGGCGGTGGCAGCAGCGTGGGTCGCGC
27 ISAHYPSGSSDPATVVAPSA
GACGTGGCCGAGGTCCGCCGGGCCGCCGAGGGGACCGACGGAGTGGTCTCCGTCCAGGAC
45220 + + + + + + 45279
CTGCACCGGCTCCAGGCGGCCCGGCGGCTCCCCTGGCTGCCTCACCAGAGGCAGGTCCTG
27 DVAEVRRAAEGTDGVVSVQD
GGCCCCACCACTCCCGACGGAGAGCTGACCATGCTGTCCGTGGTGCTGAAGGACGTTCCC
45280 + + + + + + 45339
CCGGGGTGGTGAGGGCTGCCTCTCGACTGGTACGACAGGCACCACGACTTCCTGCAAGGG
27 GPTTPDGELTMLSVVLKDVP
GACAGCAGCGGGGCCAAGGACACCATCGATGCACTGCGGGACAACACGGATGCTCTCGTG
45340 + + + + + + 45399
CTGTCGTCGCCCCGGTTCCTGTGGTAGCTACGTGACGCCCTGTTGTGCCTACGAGAGCAC
27 DSSGAKDTIDALRDNTDALV
GGGGGTACGACGGCCCAGAGCCTGGACACCCAGCGCGCCTCGGTCCGTGACCTCTGGGTC
45400 + + + + + + 45459
CCCCCATGCTGCCGGGTCTCGGACCTGTGGGTCGCGCGGAGCCAGGCACTGGAGACCCAG
27 GGTTAQSLDTQRASVRDLWV
ACCGTCCCCGCGGTCCTGCTGGTGGTCCTGCTCGTCCTGATCTGGCTGCTGCGCTCGGTC
45460 + + + + + + 45519
TGGCAGGGGCGCCAGGACGAC CAC CAGGACGAGCAGGACTAGAC CGACGACGCGAGCCAG
27 TVPAVLLVVLLVLIWLLRSV
ACCGGACCGCTGATCATGCTCGGCACCGTGGTCGTGTCGTTCTTCGCGGCCCTGGGGGCG
45520 + + + + + + 45579
57
TGGCCTGGCGACTAGTACGAGCCGTGGCAC CAGCACAGCAAGAAGCGC CGGGACCCCCGC
27 TGPL I MLGTVVVS FFAALGA
TCCAACCTGCTCTTCGAGTACGTGATGGGGCACGCCGGCGTCGACTGGTCGGTGCCGCTT
45580 + + + + + + 45639
AGGTTGGACGAGAAGCTCATGCACTACCCCGTGCGGCCGCAGCTGACCAGCCACGGCGAA
27 SNLLFEYVMGHAGVDWSVPL
CTCGGGTTCGTGTACCTGGTCGCCCTCGGAATCGACTACAACATCTTCCTCATGCACCGG
45640 + + + + + + 45699
GAGCCCAAGCACATGGACCAGCGGGAGCCTTAGCTGATGTTGTAGAAGGAGTACGTGGCC
27 LGFVYLVALGIDYNI FLMHR
GTGAAGGAGGAGGTCGCTCTGCACGGCCATGCCAAGGGCGTGCTCACCGGCCTGACCACC
45700 + + + + + + "-- 45759
CACTTCCTCCTCCAGCGAGACGTGCCGGTACGGTTCCCGCACGAGTGGCCGGACTGGTGG
27 VKEEVALHGHAKGVLTGLTT
ACCGGGGGCGTCATCACCAGTGCCGGCGTGGTCCTGGCCGCGACGTTCGCCGTCATCGCC
45760 + + + + + + 45819
TGGCCCCCGCAGTAGTGGTCACGGCCGCACCAGGACCGGCGCTGCAAGCGGCAGTAGCGG
27 TGGVI TSAGVVLAATFAVIA
ACACTGCCGCTGGTCCCGATGGCCCAGATGGGTGTCGTGGTCGGCCTGGGCATTCTGCTG
45820 + + + + + + 45879
TGTGACGGCGACCAGGGCTACCGGGTCTACC CACAGCACCAGCCGGACC CGTAAGACGAC
27 TLPLVPMAQMGVVVGLGILL
GACACCTTCCTCGTCCGGACGATTCTTCTGCCGGCCCTGGCGCTCGATCTGGGGCCCCGG
45880 + + + + + + 45939
CTGTGGAAGGAGCAGGCCTGCTAAGAAGACGGCCGGGACCGCGAGCTAGACCCCGGGGCC
27 DTFLVRTILLPALALDLGPR
TTCTGGTGGCCGGGCGCGCTGTCGAAGACGTCCGGGGGACCGGCCCCCGTCCGCGAGGAC
45940 + + + + + + 45999
AAGACCACCGGCCCGCGCGACAGCTTCTGCAGGCCCCCTGGCCGGGGGCAGGCGCTCCTG
27 FWWPGALSKTSGGPAPVRED
CGCACGTCCCAGCCCGTGGGCTGAGACCCGTCCCGACGAGACCCGTACGGCGGGCGGCCG
46000 + + + + + + 46059
GCGTGCAGGGTCGGGCACCCGACTCTGGGCAGGGCTGCTCTGGGCATGCCGCCCGCCGGC
27 RTSQPVG* (ORF27)
GTTCCCCCGGGCCGTACGACTGAGCAACCCAGAAGATGGGCCGCCCGCGACCAGGCGTCA
46060 + + + + + + 46119
CAAGGGGGCCCGGCATGCTGACTCGTTGGGTCTTCTACCCGGCGGGCGCTGGTCCGCAGT
CGATGGTGGCCCACCGGCCGCAGGCCGATCTCCCGGAAGGAAGCGCCGTGTTGGGCGATG
46120 + + + + + + 46179
GCTACCACCGGGTGGCCGGCGTCCGGCTAGAGGGCCTTCCTTCGCGGCACAACCCGCTAC
28 (ORF2 8) V L G D E -
AGGACGGCAAGGCCGCCGAGCTGTGGTCGATGGCGAACCTGGGTACACCGATGGCCGTGC
46180 + + + + + + 4 &239
TCCTGCCGTTCCGGCGGCTCGACACCAGCTACCGCTTGGACCCATGTGGCTACCGGCACG
28 DGKAAELWSMANLGTPMAVR-
GCGTCGCGGCGACCCTGCGCATCGCCGACCACATCACGGCCGGAGCGCACACCGCCGGCG
46240 + + + + + + 4 "99
CGCAGCGCCGCTGGGACGCGTAGCGGCTGGTGTAGTGCCGGCCTCGCGTGTGGCGGCCGC
28 VAATLRIADHI TAGAHTAGE-
AAATCGCCGAAGCGGCCGCCGTGCACGAGGAATCCCTCGACCGGCTGCTGCGCTACCTCA
46300 + + + + + + 46359
TTTAGCGGCTTCGCCGGCGGCACGTGCTCCTTAGGGAGCTGGCCGACGACGCGATGGAGT
28 IAEAAAVHEESLDRLLRYLT-
58
CCGTCCGGGGCCTGCTGGACCGTGACGGGCTCGGCCGGTACACGCTGACCCCCCTGGGCC
46360 + + + + + + 46419
GGCAGGCCCCGGACGACCTGGCACTGCCCGAGCCGGCCATGTGCGACTGGGGGGACCCGG
28 VRGLLDRDGLGRYTLTPLGR-
GGCCGCTGTGCGAGGACCACCCCGCCGGCGTCCGGGCCTGGTTCGACATGGAGGGAGCGG
46420 + + + + + + 46479
CCGGCGACACGCTCCTGGTGGGGCGGCCGCAGGCCCGGACCAAGCTGTACCTCCCTCGCC
28 PLCEDHPAGVRAWFDMEGAG-
GGCGGGGCGAGCTGTCGTTCGTCGACCTGCTGCACAGCGTACGGACCGGGAAGGCCGCCT
46480 + + + + + + 46539
CCGCCCCGCTCGACAGCAAGCAGCTGGACGACGTGTCGCATGCCTGGCCCTTCCGGCGGA
28 RGELSFVDLLHSVRTGKAAF-
TCCCCCTGCGCTACGGCCGCCCCTTCTGGGAGGACCTGGCGGAGGACCCCCGCCGCGCGG
46540 + + + + + + 46599
AGGGGGACGCGATGCCGGCGGGGAAGACCCTCCTGGACCGCCTCCTGGGGGCGGCGCGCC
28 PLRYGRPFWEDLAED PRRAE-
AGTCCTTCAACCGGCTGCTCGGCCAGGACGTCGCCACTCGCGCCCCGGCCGTGGTGGCCG
46600 + + + + + + 46659
TCAGGAAGTTGGCCGACGAGCCGGTCCTGCAGCGGTGAGCGCGGGGCCGGCACCACCGGC
28 SFNRLLGQDVATRAPAVVAG-
GCTTCGACTGGGCGAGCACCGGTCATGTCATCGACCTCGGAGGCGGCGACGGCTCCCTGC
46660 + + + + + + 46719
CGAAGCTGACCCGCTCGTGGCCAGTACAGTAGCTGGAGCCTCCGCCGCTGCCGAGGGACG
28 FDWASTGHVIDLGGGDGSLL-
TGACCGCACTGCTGACCGCCTGTCCGTCACTGCGCGGCACGGTCCTGGACCTGCCCGAAG
46720 + + + + + + 4 6 77 9
ACTGGCGTGACGACTGGCGGACAGGCAGTGACGCGCCGTGCCAGGACCTGGACGGGCTTC
28 TALLTAC P SLRGTVLDL P EA-
CGGTGCAGCGTGCCAAGGAGTCGTTCGCCGTGTCCGGACTGGACGACCGGGCGAACGCGG
46780 + + + + + + 46839
GCCACGTCGCACGGTTCCTCAGCAAGCGGCACAGGCCTGACCTGCTGGCCCGCTTGCGCC
28 VQRAKES FAVSGLDDRANAV-
TCGCGGGCAGCTTCTTCGACGCCCTCCCCGCCGGCGCGGGCGCCTACGTCCTGTCCCTGG
46840 + + + + + + 46899
AGCGCCCGTCGAAGAAGCTGCGGGAGGGGCGGCCGCGCCCGCGGATGCAGGACAGGGACC
28 AGS FFDALPAGAGAYVLSLV-
TCCTGCACGACTGGGACGACGAGGCGTCCGTCGCGATCCTGCGGCGCTGCGCCGAGGCGG
46900 + + + + + + 46959
AGGACGTGCTGACCCTGCTGCTCCGCAGGCAGCGCTAGGACGCCGCGACGCGGCTCCGCC
28 LHDWDDEASVAI LRRCAEAA-
CGGGGCAGACGGGATCGGTGTTCGTCATCGAGTCGACCGGCTCGGCGGGGGACGCCCCGC
46960 + + + + + + 4 7019
GCCCCGTCTGCCCTAGCCACAAGCAGTAGCTCAGCTGGCCGAGCCGCCCCCTGCGGGGCG
28 GQTGSVFVI ESTGSAGDAPH-
ACACAGGTATGGACCTGCGCATGCTGTGCATCTACGGAGCCAAGGAGCGCCGCGTGGAGG
47020 + + + + + + 47079
TGTGTCCATACCTGGACGCGTACGACACGTAGATGCCTCGGTTCCTCGCGGCGCACCTCC
28 TGMDLRMLCIYGAKERRVEE-
AGTTCGAGGAACTCGCCGGCCGGGCCGGGCTCCGGGTCGTCGCCGTCCACCCCGCGGGCC
47080 + + + + + + 47139
TCAAGCTCCTTGAGCGGCCGGCCCGGCCCGAGGCCCAGCAGCGGCAGGTGGGGCGCCCGG
28 FEELAGRAGLRVVAVH PAGP-
59
CTTCCGCGATCATCCAGATGTCCGCGGTCTGACCGCCCGGAGCCCCGGCCCATCGCGGCG
47140 + + + + + + 47199
GAAGGCGCTAGTAGGTCTACAGGCGCCAGACTGGCGGGCCTCGGGGCCGGGTAGCGCCGC
28 SAIIQMSAV* (ORF28)
CGGGCCACGGCAGACAAGGAGAGAGCGTATGGCCGGCCTGGTCATGTCGCCGGTGGAGGC
47200 + + + + + + 47259
GCCCGGTGCCGTCTGTTCCTCTCTCGCATACCGGCCGGACCAGTACAGCGGCCACCTCCG
(ORF29) MAGLVMS P V E A -
GCTCGACGCGCTGGGCACGGTGCAGGGGCGTCAGGACCCCTATCCCTTCTACGAGGCGAT
47260 + + + + + + 47319
CGAGCTGCGCGACCCGTGCCACGTCCCCGCAGTCCTGGGGATAGGGAAGATGCTCCGCTA
29 LDALGTVQGRQDPYPFYEAI
CCGCGCGCACGGGCAGGCGGTCCCCACGAAGCCCGGCCGCTTCGTGGTGGTCGGCCACGA
47320 + + + + + + 47379
GGCGCGCGTGCCCGTCCGCCAGGGGTGCTTCGGGCCGGCGAAGCACCACCAGCCGGTGCT
29 RAHGQAVPTKPGRFVVVGHD
CGCGTGCGACCGGGCGCTGCGGGAACCGGCCCTGCGCGTCCAGGACGCCAGGAGCTACGA
47380 + + + + + + 47439
GCGCACGCTGGCCCGCGACGCCCTTGGCCGGGACGCGCAGGTCCTGCGGTCCTCGATGCT
29 ACDRALREPALRVQDARSYD
CGTCGTCTTCCCCTCGTGGCGGTCGCACTCCTCGGTCCGGGGGTTCACCAGCTCCATGCT
47440 + + + + + + 47499
GCAGCAGAAGGGGAGCACCGCCAGCGTGAGGAGCCAGGCCCCCAAGTGGTCGAGGTACGA
29 VVFPSWRSHSSVRGFTSSML-
CTACAGCAACCCGCCCGATCACGGCCGGTTGCGCCAGGTGGTGAGCTTCGCGTTCACCCC
47500 + + + + + + 47559
GATGTCGTTGGGCGGGCTAGTGCCGGCCAACGCGGTCCACCACTCGAAGCGCAAGTGGGG
29 YSNPPDHGRLRQVVSFAFTP
GCCCAAGGTGCGCCGGATGCACGGGGTGATCGAGGACATGACCGACCGGCTCCTCGACCG
47560 + + + + + + 47619
CGGGTTCCACGCGGCCTACGTGCCCCACTAGCTCCTGTACTGGCTGGCCGAGGAGCTGGC
29 PKVRRMHGVIEDMTDRLLDR
GATGGCCCGGCTCGGCTCCGGCGGCTCCCCGGTCGACCTCATAGCCGAGTTCGCCGCCCG
47620 + + + + + + 47679
CTACCGGGCCGAGCCGAGGCCGCCGAGGGGCCAGCTGGAGTATCGGCTCAAGCGGCGGGC
29 MARLGSGGSPVDLIAEFAAR-
GCTGCCCGTCGCGGTGATCAGCGAGATGATCGGCTTTCCGGCGAAGGACCAGGTGTGGTT
47680 + , + + + + + 47739
CGACGGGCAGCGCCACTAGTCGCTCTACTAGCCGAAAGGCCGCTTCCTGGTCCACACCAA
29 LPVAVISEMIGFPAKDQVWF
CCGCGACATGGCCTCCCGGGTCGCCGTGGCGACGGACGGTTTCACCGACCCCGGCGCGCT
47740 + + + + + + 47799
GGCGCTGTACCGGAGGGCCCAGCGGCACCGCTGCCTGCCAAAGTGGCTGGGGCCGCGCGA
29 RDMASRVAVATDGFTDPGAL
CACGGGGGCCGACGCCGCCATGGACGAGATGAGCGCCTACTTCGACGACCTCCTGGACCG
47800 + + + + + + 47859
GTGCCCCCGGCTGCGGCGGTACCTGCTCTACTCGCGGATGAAGCTGCTGGAGGACCTGGC
29 TGADAAMDEMSAYFDDLLDR
TCGCCGCCGCACCCCGGCCGACGACCTGGTCACCCTGCTCGCCGAGGCCCACGACGGCTC
47860 + + + + + + 47919
AGCGGCGGCGTGGGGCCGGCTGCTGGACCAGTGGGACGAGCGGCTCCGGGTGCTGCCGAG
29 RRRTPADDLVTLLAEAHDGS
CCCCGGGCGCCTGGACCACGACGAACTGATGGGCACCATGATGGTGCTGCTCACAGCCGG
60
47920 + + + + + + 47979
GGGGCCCGCGGACCTGGTGCTGCTTGACTACCCGTGGTACTACCACGACGAGTGTCGGCC
29 PGRLDHDELMGTMMVLLTAG-
GTTCGAGACCACGAGCTTTCTGATCGGCCACGGGGCGATGATCGCCCTCGAACAACGGGC
47980 + + + + + + 48039
CAAGCTCTGGTGCTCGAAAGACTAGCCGGTGCCCCGCTACTAGCGGGAGCTTGTTGCCCG
29 FETTSFLIGHGAMIALEQRA-
GCACGCGGCCCGGCTGCGGGCCGAACCCGACTTCGCCGACGGCTACGTCGAGGAGATCCT
48040 + + + + + + 48099
CGTGCGCCGGGCCGACGCCCGGCTTGGGCTGAAGCGGCTGCCGATGCAGCTCCTCTAGGA
29 HAARLRAEPDFADGYVEE I L
CAGGTTCGAGCCGCCGGTCCACGTCACCAGCCGGTGGGCTGCCGAGGACCTCGACCTGCT
48100 + + + + + + 48159
GTCCAAGCTCGGCGGCCAGGTGCAGTGGTCGGCCACCCGACGGCTCCTGGAGCTGGACGA
29 RFEPPVHVTSRWAAEDLDLL-
GGGCCTGTCCGTACCGGCGGGCTCCAAGCTGGTCCTGATCCTGGCCGCCGCGAATCGCGA
48160 + + + + + + 48219
CCCGGACAGGCATGGCCGCCCGAGGTTCGACCAGGACTAGGACCGGCGGCGCTTAGCGCT
29 GLSVPAGSKLVLI LAAANRD
TCCCGGCCGCTACCCCGAGCCCGGCCGCTTCGACCCCGACCGCTACGCGCCCCGGCCGGG
48220 + + + + + + 48279
AGGGCCGGCGATGGGGCTCGGGCCGGCGAAGCTGGGGCTGGCGATGCGCGGGGCCGGCCC
29 PGRYPEPGRFDPDRYAPRPG-
CGGGCCGGAGGCCACCAGACCGCTGAGCTTCGGCGCGGGCGGCCACTTCTGCCTCGGCGC
48280 + + + + + + 48339
GCCCGGCCTCCGGTGGTCTGGCGACTCGAAGCCGCGCCCGCCGGTGAAGACGGAGCCGCG
29 GPEATRPLSFGAGGHFCLGA-
TCCGCTGGCGCGGCTGGAAGCCCGGATCGCGCTGCCGCGTCTGCTGCGCCGCTTCCCGGA
48340 + + + + + + 48399
AGGCGACCGCGCCGACCTTCGGGCCTAGCGCGACGGCGCAGACGACGCGGCGAAGGGCCT
29 PLARLEARIALPRLLRRFPD-
CCTGGCCGTGTCCGAGCCCCCCGTCTACCGCGACCGCTGGGTCGTCCGCGGCCTCGAAAC
48400 + + + + + + 48459
GGACCGGCACAGGCTCGGGGGGCAGATGGCGCTGGCGACCCAGCAGGCGCCGGAGCTTTG
29 LAVSEPPVYRDRWVVRGLET-
CTTTCCCGTGACCCTCGGGTCCTGAGCCCCCGCCGGCCGGAACACGTGACCGTCCCGGCC
48460 + + + + + + 48519
GAAAGGGCACTGGGAGCCCAGGACTCGGGGGCGGCCGGCCTTGTGCACTGGCAGGGCCGG
29 FPVTLGS* (ORF29)
GGCGGGTGCGCGCCCTCTCAGACGTACAGGGTGTTGGGCCCCTGACCACACAGCACCCGG
48520 + + + + + + 48579
CCGCCCACGCGCGGGAGAGTCTGCATGTCCCACAACCCGGGGACTGGTGTGTCGTGGGCC
CCGTACAGCTCCAGGTTGGTGCTCGGGTTCATGCAGGTGCAGCGTGATGCTCTGGGCATC
48580 + + + + + + 48639
GGCATGTCGAGGTCCAACCACGAGCCCAAGTACGTCCACGTCGCACTACGAGACCCGTAG
30 (ORF30)* APAAHHEPC
GCTGCACGCGCTGGATCGGGACGTCGTTGTAGATCGAGGACCCGCCGCTCGCCTGGGCGA
48640 + + + + + + 48699
CGACGTGCGCGACCTAGCCCTGCAGCAACATCTAGCTCCTGGGCGGCGAGCGGACCCGCT
30 RQVRQI PVDNYI S SGGSAQA
GGATGTCCACCGACTCCTTGCCCAGTCGGCACGCCCGCCCCAGCAGGCCGCGGCACAGCA
48700 + + + + + + 48759
CCTACAGGTGGCTGAGGAACGGGTCAGCCGTGCGGGCGGGGTCGTCCGGCGCCGTGTCGT
61
30 LIDVSEKGLRCARGLLGRCL
CCCGCTCCTCCAGCGTCCAGGCCTCGCCCGAAGCCCCCTTGGAGTCGACGAGGTCGGCCA
48760 + + + + + + 48819
GGGCGAGGAGGTCGCAGGTCCGGAGCGGGCTTCGGGGGAACCTCAGCTGCTCCAGCCGGT
30 VREELTWAEGSAGKSDVLDA
GCCGATGGGCGTGGAACCGTGCCTCGTCGGCCAGCAGGGTCGCCTCGCCGAGCTGCAGGT
48820 + + + + + + 48879
CGGCTACCCGCACCTTGGCACGGAGCAGCCGGTCGTCCCAGCGGAGCGGCTCGACGTCCA
30 LRHAHFRAEDALLTAEGLQL
GGGTGATCGGCGCCGAGCCCTGCTCCTCGTACTCGGTGTAGGTGATCTTGCGGCCGGGCA
48880 + + + + + + 48939
CCCACTAGCCGCGGCTCGGGACGAGGAGCATGAGCCACATCCACTAGAACGCCGGCCCGT
30 HTI PASGQEEYETYTIKRGP
GCCTCCCGCGGAAGACGTCCTGAGCGGCCGCGGCCAGTCCGGTCATGGTGCCGACCGACG
48940 + + + + + + 48999
CGGAGGGCGCCTTCTGCAGGACTCGCCGGCGCCGGTCAGGCCAGTACCACGGCTGGCTGC
30 LRGRFVDQAAAALGTMTGVS
AGGCCGAGGCCACGGCCAGCATCGGCGCCCGGAACATCGGTGATCCGGCGTTGAGTTCGG
49000 + + + + + + 49059
TCCGGCTCCGGTGCCGGTCGTAGCCGCGGGCCTTGTAGCCACTAGGCCGCAACTCAAGCC
30 SASAVALMPARFMPSGANLE
AGGCGTACTGCTGCTGGAGCACCGCGCCCAGCGGAAGGACGCGCTCCTGGGGAACGAAGA
49060 + + + + + + 49119
TCCGCATGACGACGACCTCGTGGCGCGGGTCGCCTTCCTGCGCGAGGACCCCTTGCTTCT
30 SAYQQQLVAGLPLVREQPVF
CGTCCGCGGCGATGGTGCTGACGCTTCCCGAGCCCCGGAGCCCCGAGGTGTGCCAGTCGT
49120 + + + + + + 49179
GCAGGCGCCGCTACCACGACTGCGAAGGGCTCGGGGCCTCGGGGCTCCACACGGTCAGCA
30 VDAAITSVSGSGRLGSTHWD
CGACGATCTGCAGCTGGTCGGTCGGCACCAGGGCCATCACGGGCTGCATGCCGCCGTCGG
49180 + + + + + + 49239
GCTGCTAGACGTCGACCAGCCAGCCGTGGTCCCGGTAGTGCCCGACGTACGGCGGCAGCC
30 DVIQLQDTPVLAMVPQMGGD
GGGTCGGTGAGACGGCGATCAGAACCTGCCAGTGACTGTGCCAGGCACCGCTGATGAAGC
49240 + + + + + + 49299
CCCAGCCACTCTGCCGCTAGTCTTGGACGGTCACTGACACGGTCCGTGGCGACTACTTCG
30 PTPSVAILVQWHSHWAGSIF
CCCACTTGCCGTTCACTACGACACCGCCGTCGACCGGGGCCGCCATGCCGCCGGGACTGA
49300 + + + + + + 49359
GGGTGAACGGCAAGTGATGCTGTGGCGGCAGCTGGCCCCGGCGGTACGGCGGCCCTGACT
30 GWKGNVVVGGDVPAAMGGPS
GGGTGCCGGAGACCCGGACATCCGGCCGGGAGAACACCTCGTCCTGCACGTGGTCGGGGA
49360 + + + + + + 49419
CCCACGGCCTCTGGGCCTGTAGGCCGGCCCTCTTGTGGAGCAGGACGTGCACCAGCCCCT
30 LTGSVRVDPRSFVEDQVHDP
AGAGGCCCGCCATCCAGGTGGGTATCCACCACACCGAGGCCGTCCAGGCGGCCGATCCGT
49420 + + + + + + 49479
TCTCCGGGCGGTAGGTCCACCCATAGGTGGTGTGGCTCCGGCAGGTCCGCCGGCTAGGCA
30 FLGAMWTP IWWVSATWAASG
CGCCGCGCGCCAGCTCGGCGGCCACGTCCACCAGGGTGCGGGCGTCGGACTCGAAGCCGC
49480 + + + + + + 49539
GCGGCGCGCGGTCGAGCCGCCGGTGCAGGTGGTCCCACGCCCGCAGCCTGAGCTTCGGCG
30 DGRALEAAVDVLTRADSEFG
62
CGTAACGGGCCGGCACGCGCATGCGGAAGATCCCGGCTTCGGCCATCGCCTCGACCGACT
49540 + + + + + + 49599
GCATTGCCCGGCCGTGCGCGTACGCCTTCTAGGGCCGAAGCCGGTAGCGGAGCTGGCTGA
30 GYRAPVRMRFI GAEAMAEVS
CCTCGTGCAGCCGCCGGTTCTCCTCGGTCCAGGCCGCGTGGGACTGGAGCAGCGGCCTCA
49600 + + + + + + 49659
GGAGCACGTCGGCGGCCAAGAGGAGCCAGGTCCGGCGCACCCTGACCTCGTCGCCGGAGT
30 EEHLRRNEETWAAHSQLLPR
GCTTCGAGGCCCGTTCCACCAGTTCGGTACGGGCGGGCGTAGACGTCTGGTCCACTCGAT
49660 + + + + + + 49719
CGAAGCTCCGGGCAAGGTGGTCAAGCCATGCCCGCCCGCATCTGCAGACCAGGTGAGCTA
30 LKSAREVLETRAPTSTQDV
(ORF3 0)
CCTCCAGGAATCATGAGACGCCCTGTCCGCGGTATGCGGAAGCAGGCGTCTGCGCGCATC
49720 + + + + + + 49779
GGAGGTCCTTAGTACTCTGCGGGACAGGCGCCATACGCCTTCGTCCGCAGACGCGCGTAG
GGTCAGGACGGCGTCGCCCTGCTCCCGCATGGTTCACCGAGTTCCGCGGACGTCGCATCT
49780 + + + + + + 49839
CCAGTCCTGCCGCAGCGGGACGAGGGCGTACCAAGTGGCTCAAGGCGCCTGCAGCGTAGA
CCTTGATTGCCGGTCACCTACCCCGATGCCGATCGGGCTGGTGCGACAGCGCATCCCACG
49840 + + + + + + 49899
GGAACTAACGGCCAGTGGATGGGGCTACGGCTAGCCCGACCACGCTGTCGCGTAGGGTGC
AGAAGTCCACGAACGGTCCGGGAAGCCAGAATGTGCTTCTCGGCCGGAGTCACGGCCGGC
49900 + + + + + + 49959
TCTTCAGGTGCTTGCCAGGCCCTTCGGTCTTACACGAAGAGCCGGCCTCAGTGCCGGCCG
GCCGGCGCCCGTCGCCGGTCACGCCGGACCACGCCCGGACCGGTCATGGAGGCAGCCCAT
49960 + + + + + + 50019
CGGCCGCGGGCAGCGGCCAGTGCGGCCTGGTGCGGGCCTGGCCAGTACCTCCGTCGGGTA
GAGTGACAACGACAGTCCGTCCCGGGTGCCGGCCGCGGTGGCACCCGCCACCGCGAAACC
50020 + + + + + + 50079
CTCACTGTTGCTGTCAGGCAGGGCCCACGGCCGGCGCCACCGTGGGCGGTGGCGCTTTGG
GTCGGCCGGCACGGTCCTCGGCGCCGCGGTGGCTTCGCCCGCCGCCTACACCGCGGCGAC
50080 + + + + + + 50139
CAGCCGGCCGTGCCAGGAGCCGCGGCGCCACCGAAGCGGGCGGCGGATGTGGCGCCGCTG
CGCCCAGGAAGCGGCGACCGCGCTGGTCCGCATGCTGATGGAACAGATGGTGCTCGGTCC
50140 + + + + + + 50199
GCGGGTCCTTCGCCGCTGGCGCGACCAGGCGTACGACTACCTTGTCTACCACGAGCCAGG
CGGCGCGGTCGGTCCCGAGACCCGCGCGGACGGCCCGGCGGGGCGGACCGGCTCCGGCCA
50200 + + + + + + 50259
GCCGCGCCAGCCAGGGCTCTGGGCGCGCCTGCCGGGCCGCGCCGCCTGGCCGAGGCCGGT
CGGCCCGGCGCCGCAGACCGGACCGGACGCGCCGGGCGAACCCCCGCCCACGTGGGCGCC
50260 + + + + + + 50319
GCCGGGCCGCGGCGTCTGGCCTGGCCTGCGCGGCCCGCTTGGGGGCGGGTGCACCCGCGG
GAACCTCGACGACGGGAAGGTAGGAGGACGATGAGGCCGCTCGTTCGGGCAGTGCTGCGG
50320 + + + + + + 50379
CTTGGAGCTGCTGCCCTTCCATCCTCCTGCTACTCCGGCGAGCAAGCCCGTCACGACGCC
31 (ORF31) MRPLVRAVLR-
GGTTCCCTGCGGCAGGTGAGGTACGTGGACGTGGTCTCCCCGCGCCGGGCGCGCTCCCTG
50380 + + + + + + 50439
CCAAGGGACGCCGTCCACTCCATGCACCTGCACCAGAGGGGCGCGGCCCGCGCGAGGGAC
31 GSLRQVRYVDVVSPRRARSL
63
GTGGCGCGGGTGTACCGGGAGACCGAGGAGCAGTTCGGCGTGCTCGCGCCCCCCCTGGCC
50440 + + + + + + 50499
CACCGCGCCCACATGGCCCTCTGGCTCCTCGTCAAGCCGCACGAGCGCGGGGGGGACCGG
31 VARVYRETEEQFGVLAPPLA
CTCCACTCGCCCGCCGCGGCGTCGCTGGCCGCGACGTGGCTCATGCTGCGGGAGACACTG
50500 + + + + + + 50559
GAGGTGAGCGGGCGGCGCCGCAGCGACCGGCGCTGCACCGAGTACGACGCCCTCTGTGAC
31 LHSPAAASLAATWLMLRETL
CTGGTCGACGGGCGGGTGAGCCGGGCGGTGAAGGAGACGGTCGCCACCGAGGTCTCCCGT
50560 + + + + + + 50619
GACCAGCTGCCCGCCCACTCGGCCCGCCACTTCCTCTGCCAGCGGTGGCTCCAGAGGGCA
31 LVDGRVSRAVKETVATEVSR
GCCAACGACTGTCCGTACTGCGTCCAGGTCCATCAGGCGGTACTCGGGACACTGCCTCCG
50620 + + + + + + 50679
CGGTTGCTGACAGGCATGACGCAGGTCCAGGTAGTCCGCCATGAGCCCTGTGACGGAGGC
31 ANDCPYCVQVHQAVLGTLPP
GACGGCGGCCAGGCCGGGCTCCTGCGGTGGGTCCGGGAGGCAGGCCGACGGCCCGGCGGC
50680 + + + + + + 50739
CTGCCGCCGGTCCGGCCCGAGGACGCCACCCAGGCCCTCCGTCCGGCTGCCGGGCCGCCG
31 DGGQAGLLRWVREAGRRPGG
GGTGCGGTGGGCGGCGGGCGGCCGCTTCCGTTCAGCGGTGAACAGGCACCGGAACTGTGC
50740 + + + + + + 50799
CCACGCCACCCGCCGCCCGCCGGCGAAGGCAAGTCGCCACTTGTCCGTGGCCTTGACACG
31 GAVGGGRPLPFSGEQAPELC
GGCGTCGTGGTCACGTTCCACTACATCAACCGCATGGTCTCCCTCTTCCTCGACGACTCC
50800 + + + + + + 50859
CCGCAGCACCAGTGCAAGGTGATGTAGTTGGCGTACCAGAGGGAGAAGGAGCTGCTGAGG
31 GVVVTFHYINRMVSLFLDDS
CCCATGCCGACCCGGACGCCGACACCGTTGCGCGGGCCCATCATGAGGACCACCGCACTG
50860 + + + + + + 50919
GGGTACGGCTGGGCCTGCGGCTGTGGCAACGCGCCCGGGTAGTACTCCTGGTGGCGTGAC
31 PMPTRTPTPLRGPIMRTTAL
GCCATGCGTCCCGTCGGCCCGGGGCTGCTGACACCGGGCGCATCGCTCGGCCTGCTGCCT
50920 + + + + + + 50979
CGGTACGCAGGGCAGCCGGGCCCCGACGACTGTGGCCCGCGTAGCGAGCCGGACGACGGA
31 AMRPVGPGLLTPGASLGLLP
CCGGCTCCCCTGCCGCCCGGACTGGAGTGGGCCGAGGGCAACCCTTTCGTGGCCCAGGCG
50980 + + + + + + 51039
GGCCGAGGGGACGGCGGGCCTGACCTCACCCGGCTCCCGTTGGGAAAGCACCGGGTCCGC
31 PAPLPPGLEWAEGNPFVAQA
CTGGGGCGTGCCGTCGCCGCTGTGGACCAGGGAGCGCACTGGGTGCCCGAACCGGTCCGG
51040 + + + + + + 51099
GACCCCGCACGGCAGCGGCGACACCTGGTCCCTCGCGTGACCCACGGGCTTGGCCAGGCC
31 LGRAVAAVDQGAHWVPEPVR
GAGCGGCTGCGCACACGTCTGGACACCTGGGACGGATCGGCGCCGGGCCTCGGCCGGGGA
51100 + + + + + + 51159
CTCGCCGACGCGTGTGCAGACCTGTGGACCCTGCCTAGCCGCGGCCCGGAGCCGGCCCCT
31 ERLRTRLDTWDGSAPGLGRG
TGGCTCGACGAGGCCGTGTCCGGCCTGCCGCCCCAGGACGTGCCCGCGGCACGGCTGGCG
51160 + + + + + + 51219
ACCGAGCTGCTCCGGCACAGGCCGGACGGCGGGGTCCTGCACGGGCGCCGTGCCGACCGC
31 WLDEAVSGLPPQDVPAARLA
CTGCTGACGGCCTTCGCCCCCTACCAGGTGCTCCCGGACGACGTCGAGGAGTTCAGACGG
64
51220 + + + + + + 51279
GACGACTGCCGGAAGCGGGGGATGGTCCACGAGGGCCTGCTGCAGCTCCTCAAGTCTGCC
31 LLTAFAPYQVLPDDVEEFRR
CGTCGGCCCACCGACCGCGAACTCGTCGAGCTCACGTCCTACGCCGCGCTGACCACGGCC
51280 + + + + + + 51339
GCAGCCGGGTGGCTGGCGCTTGAGCAGCTCGAGTGCAGGATGCGGCGCGACTGGTGCCGG
31 RRPTDRELVELTSYAALTTA
GTCCGTGTCGGTCGCACGCTCGTCGTGCCCGACGCCGCCGGGCCGGGATGAACGGCCCCG
51340 + + + + + + 51399
CAGGCACAGCCAGCGTGCGAGCAGCACGGGCTGCGGCGGCCCGGCCCTACTTGCCGGGGC
31 VRVGRTLVVPDAAGPG* (ORF31)
CAACGGCTCGGGAAGGCTGTCTCACGGCCGGAGGCGTACGCCGGTGAGGTGCTCGGACTC
51400 + + + + + + 51459
GTTGCCGAGCCCTTCCGACAGAGTGCCGGCCTCCGCATGCGGCCACTCCACGAGCCTGAG
(ORF32) * PRLRVGTLHE S E -
CTCCCAGAGGCGGCGCCGGGCCCTGGGGTCGACGGCTGCTCCGCCGGGGCGCACGAGCCC
51460 + + + + + + 51519
GAGGGTCTCCGCCGCGGCCCGGGACCCCAGCTGCCGACGAGGCGGCCCCGCGTGCTCGGG
32 EWLRRRARPDVAAGGPRVLG-
GGGTGCGCCCCGGGTCTCGGTCACGCCGAGGGGCCCGTAGAACTCGCCCCCGCGCGCGCC
51520 + + + + + + 51579
CCCACGCGGGGCCCAGAGCCAGTGCGGCTCCCCGGGCATCTTGAGCGGGGGCGCGCGCGG
32 PAGRTETVGLPGYFEGGRAG-
GGGATCGGTGGCCGCCCGCAGACCAGGCAGCATCCCCGCCGCGGCGGGCTGCAGGAACAA
51580 + + + + + + 51639
CCCTAGCCACCGGCGGGCGTCTGGTCCGTCGTAGGGGCGGCGCCGCCCGACGTCCTTGTT
32 PDTAARLGPLMGAAAPQLFL-
CGGGGCGAGCGGGGAGCCGAGCCTGCGCACGGGCGCGGGAAAGTCCCGGCCCAGACCGGT
51640 + + + + + + 51699
GCCCCGCTCGCCCCTCGGCTCGGACGCGTGCCCGCGCCCTTTCAGGGCCGGGTCTGGCCA
32 PAL PSGLRRVPAPFDRGLGT-
CGCGGTCAGCCCGGGATGAGCGGCGAGCGAGGCCAGTTCCGCGCCGGACTCCGCCAGTCT
51700 + + + + + + 51759
GCGCCAGTCGGGCCCTACTCGCCGCTCGCTCCGGTCAAGGCGCGGCCTGAGGCGGTCAGA
32 ATLGPHAALSALEAGS EALR-
GTGATGGAGTTCCAGCGCGAACATGAGGTTGGCCAGCTTGGACTGGTTGTAGGCCCGGTA
51760 + + + + + + 51819
CACTACCTCAAGGTCGCGCTTGTACTCCAACCGGTCGAACCTGACCAACATCCGGGCCAT
32 HHLELAFMLNALKS QNYARY-
CCGGCTGTAGCGGCGTTCGCCGTGAAGGTCGCTGAAGTCGATGCGCCCCAGCCGGTGCAG
51820 + + + + + + 51879
GGCCGACATCGCCGCAAGCGGCACTTCCAGCGACTTCAGCTACGCGGGGTCGGCCACGTC
32 RSYRREGHLDS FD I RGLRHL-
ATAGCTGCTGATCGTCACGACCCGCGCGCCCGGCGCGGCCCGCAGGCTGTCCAGGAGCAG
51880 + + + + + + 51939
TATCGACGACTAGCAGTGCTGGGCGCGCGGGCCGCGCCGGGCGTCCGACAGGTCCTCGTC
32 YSSITVVRAGPAARLSDLLL-
GCCGGTGAGGGCGAAGTGCCCCAGGTGGTTCGTGGCGAACTGGAGTTCGTGACCGTCCGG
51940 + + + + + + 51999
CGGCCACTCCCGCTTCACGGGGTCCACCAAGCACCGCTTGACCTCAAGCACTGGCAGGCC
32 GTLAFHGLHNTAFQLEHGDP-
GGTGCGGGCCCGGTCGGTCCACATCACGCCCGCGTTGTTGACCAGCAGGTGGATGCGCGG
52000 + + + + + + 52059
65
CCACGCCCGGGCCAGCCAGGTGTAGTGCGGGCGCAACAACTGGTCGTCCACCTACGCGCC
TRARDTWMVGANNVLLH I R P -
GAAGCGGTCGCGCAGTTCCTCGGCGCCGGCACGCACCGACGCGAGACGGGAAAGATCCAG
52060 + + + + + + 52119
CTTCGCCAGCGCGTCAAGGAGCCGCGGCCGTGCGTGGCTGCGCTCTGCCCTTTCTAGGTC
FRDRLEEAGARVSALRSLDL-
CCGTCTGACCGTCAGTTGCGCCGACGGCACCCGGCTTTGGATGCGGGCCGCCGCGGCGAC
52120 + + + + + + 52179
GGCAGACTGGCAGTCAACGCGGCTGCCGTGGGCCGAAACCTACGCCCGGCGGCGCCGCTG
RRVTLQAS PVRS Q I R A A A A V -
CCCGCGGTCCGGATCGCGCACGGCCAGCACCACGTGGGCGCCGTGCCGGGCGAGCTCCTG
52180 + + + + + + 52239
GGGCGCCAGGCCTAGCGCGTGCCGGTCGTGGTGCACCCGCGGCACGGCCCGCTCGAGGAC
GRD PDRVALVVHAGHRALE Q -
CGCCAGGTGCAGTCCGATGCCGGAGCTGGCACCGGTGACCACCGCGGTGGTTCCGGTACG
52240 + + + + + + 52299
GCGGTCCACGTCAGGCTACGGCCTCGACCGTGGCCACTGGTGGCGCCACCAAGGCCATGC
ALHLG I GS SAGTVVATTGTR-
GTCCGGGACATCGGCGGCGCTCCAGCGTCGCCGCGTTCTCATCGGTCGTCCCTCCCGGGG
52300 4- + + + + + 52359
CAGGCCCTGTAGCCGCCGCGAGGTCGCAGCGGCGCAAGAGTAGCCAGCAGGGAGGGCCCC
DPVDAASWRRRTRM (ORF32)
GATGCGTCAGCCGGCCTGGGCCATCGCGGCCCGGTAGCCGTTGGCGACGATCTGCCGGGC
+ + + + + h
CTACGCAGTCGGCCGGACCCGGTAGCGCCGGGCCATCGGCAACCGCTGCTAGACGGCCCG
GGAGTGCTCGTAGTACTCGTCGTCCTTCGGCAGCTCCGTGGCGAGACCGCTGACGTACCG
52420 + + + + + + 52479
CCTCACGAGCATCATGAGCAGCAGGAAGCCGTCGAGGCACCGCTCTGGCGACTGCATGGC
GTTGAACATGCAGAACGCGGCGGCGATCAGAACGGTGTCGTGCAGAGCGGTGTCGTCCGC
52480 + + + + + + 52539
CAACTTGTACGTCTTGCGCCGCCGCTAGTCTTGCCACAGCACGTCTCGCCACAGCAGGCG
TCCCTCGGCCCGCGCCGAGGCGATCACCCCTGCGGAGACCGGGCGCGCCGCGCTCTGGAC
52540 + + + + + + 52599
AGGGAGCCGGGCGCGGCTCCGCTAGTGGGGACGCCTCTGGCCCGCGCGGCGCGAGACCTG
CTCGGCGGCGACGGCCAGCAGCGCGCGCGTCCTGCCGTCGATGGGCGCGGTGGCGGGGTC
GAGCCGCCGCTGCCGGTCGTCGCGCGCGCAGGACGGCAGCTACCCGCGCCACCGCCCCAG
GGCGAGGACGGCCTCGACGAGCTGCCGGCCTCCCGGCAGCTGCGCGGCGGCGAAGGCCCC
+ 1. h + + +
CCGCTCCTGCCGGAGCTGCTCGACGGCCGGAGGGCCGTCGACGCGCCGCCGCTTCCGGGG
GTGGGAGGCGGCGCAGAACTCGGTGGAGTTGAGATGCGAGACGTACGCCGCGATGAGCTC
52720 + + + + + + 52779
CACCCTCCGCCGCGTCTTGAGCCACCTCAACTCTACGCTCTGCATGCGGCGCTACTCGAG
GCGTTGCCCCGGTTCCAGCGAGGACGGCGCCCGCAGCAGGGCGTTCGCGAGATCGCCCAG
52780 + + + + + + 52839
CGCAACGGGGCCAAGGTCGCTCCTGCCGCGGGCGTCGTCCCGCAAGCGCTCTAGCGGGTC
CGGTGCTGCGGTGCCGGGGTGGTGAGCCATCAGACCACTGATGCCGGGGAGGTCGTTGTC
52840 + + + + + + 52899
GCCACGACGCCACGGCCCCACCACTCGGTAGTCTGGTGACTACGGCCCCTCCAGCAACAG
GAGTGCTATGTGGGGCACGGCTCTTCCTTCCGGGTGGACGAGGGGCGGACGGCGGCGGAT
52900 + + + + + + 52959
66
CTCACGATACACCCCGTGCCGAGAAGGAAGGCCCACCTGCTCCCCGCCTGCCGCCGCCTA
CAGGGCCATTCGACTTCGTCGTCGGCGGCCGCGCAGATGCGGGTGAAGGGCCATTCCACG
5 29 6 o + + + + + + 53019
GTCCCGGTAAGCTGAAGCAGCAGCCGCCGGCGCGTCTACGCCCACTTCCCGGTAAGGTGC
TCTTCCCCTCCCGTTGCGGAGTGGGCGGAGGCCGTGGTGAAGAGGGTGACGAGTCCGAAC
53020 + + + + + + 53079
AGAAGGGGAGGGCAACGCCTCACCCGCCTCCGGCACCACTTCTCCCACTGCTCAGGCTTG
GTGCCGAAGAGGAGGGACAGTCGGGCAACGTGAAGTGCGGTACCCATGCGAGCTCCTAGC
53080 + + + + + + 53139
CACGGCTTCTCCTCCCTGTCAGCCCGTTGCACTTCACGCCATGGGTACGCTCGAGGATCG
GAGGGCGGCGTGACCGCGGGACGGTGAGACCTCGTGATGCCAGGAAGCTAGCGAATCGGA
53140 + + + + + + 53 1"
CTCCCGCCGCACTGGCGCCCTGCCACTCTGGAGCACTACGGTCCTTCGATCGCTTAGCCT
CTGAGGGTGGCAACGATATGCCAGACTTTGGCAACTTGCCTGTGTATCAGCCGGACTGTC
53200 + + + + + + 53259
GACTCCCACCGTTGCTATACGGTCTGAAACCGTTGAACGGACACATAGTCGGCCTGACAG
(ORF33) V Y Q P D C R
GGCCGCTGGTAAAGACGGAACGGCGAGATCCCGCGACCGCGTCGCAGAGCAGCAGGGTCT
53260 + + + + + + 53319
CCGGCGACCATTTCTGCCTTGCCGCTCTAGGGCGCTGGCGCAGCGTCTCGTCGTCCCAGA
PLVKTERRDPATASQSSRVC-
GCTCACCCAGCGTCGGGGCGGCCAGCATGTCGCGTACCGGGAGCGTGACGCCCAGCTCGC
53320 + + + + + + 53379
CGAGTGGGTCGCAGCCCCGCCGGTCGTACAGCGCATGGCCCTCGCACTGCGGGTCGAGCG
SPSVGAASMSRTGSVTPSSR-
GGTTGATCCTGCGGACCAGCCGGGTGATGAGCAGGGAGTCGCCGCCGTGGGCGAAGAAAT
53380 + + + + + + 53439
â– CCAACTAGGACGCCTGGTCGGCCCACTACTCGTCCCTCAGCGGCGGCACCCGCTTCTTTA
LI LRTSRVMS RES P PWAKKS-
CAGCACCTTCGGAGGGGTCCGGGAAGCCGAGCAGGTCACCCCAGCCGCGCACCAGTACCT
53440 + + + + + + 53499
GTCGTGGAAGCCTCCCCAGGCCCTTCGGCTCGTCCAGTGGGGTCGGCGCGTGGTCATGGA
APSEGSGKPSRSPQPRTSTW-
GGCGGATGTCGCCGGTGGTGACGACCGTGCGCCGGGAGCCCCGACGTGCCGAGCGCAGCC
53500 + + + + + + ^-5^=>^
CCGCCTACAGCGGCCACCACTGCTGGCACGCGGCCCTCGGGGCTGCACGGCTCGCGTCGG
RMS PVVTTVRRE PRRAERS R -
GCGAGGCATGCACCAGCGCCACCTGGTCGCCGAGGTTGCGCCGCGACAGCTCGCGCAGCG
53560 + + + + + + 53619
CGCTCCGTACGTGGTCGCGGTGGACCAGCGGCTCCAACGCGGCGCTGTCGAGCGCGTCGC
EACTSATWS PRLRRD S S RSD-
ACACCGTGACGCCGAACCTCTCGGTGATCCTGCGGACCAGCCGCGTGATCAGCAGCGTGT
53620 + + + + + + 53679
TGTGGCACTGCGGCTTGGAGAGCCACTAGGACGCCTGGTCGGCGCACTAGTCGTCGCACA
TVTPNLSVI LRTSRVI S SVS-
CCCCGCCGCGCGCGAAGAAATCCGAATGCTCGGTGAGGTCGGAGCGGCCGAGGAGCTCGC
53680 + + + + + + 53739
GGGGCGGCGCGCGCTTCTTTAGGCTTACGAGCCACTCCAGCCTCGCCGGCTCCTCGAGCG
PPRAKKSECSVRSERPRSSL-
TCCACGCGCCGACCATGAACTCCCCCACGTCACCGAGCCGGTGCTCGTCGCCGTCGGGGC
53740 + + + + + + 53799
AGGTGCGCGGCTGGTACTTGAGGGGGTGCAGTGGCTCGGCCACGAGCAGCGGCAGCCCCG
67
HAP
TMNSPTSPSRCSSPSGP-
CCTTCGGCGCGCCGGATCCCGCGGAACGGTTCCGGCCGGAGACGGCAGAGCGGTCACTGG
53800 + + + + + + 53859
GGAAGCCGCGCGGCCTAGGGCGCCTTGCCAAGGCCGGCCTCTGCCGTCTCGCCAGTGACC
FGAPDPAERFRPETAERSLV-
TCACTTTCGCCACCTCCAGGGGCATGTGTCGGCTGCATCGGCTTCCCGCCACGGTACGGG
53860 + + + + + + 53919
AGTGAAAGCGGTGGAGGTCCCCGTACACAGCCGACGTAGCCGAAGGGCGGTGCCATGCCC
TFATSRGMCRLHRL PATVRE-
AGCACATGTTGCATGGCAATACCTTTCCAAGTCGGTGGCAACCCTCCTTGCCATCCACCC
53920 + + + + + + 53979
TCGTGTACAACGTACCGTTATGGAAAGGTTCAGCCACCGTTGGGAGGAACGGTAGGTGGG
HMLHGNTFPSRWQPSLPSTH-
ACTGCAGTTGGGCGAGATGTGTAGGCATTCGAGGTCCGCAGGTTTGCCAAGCCGCGCGCG
53980 + + + + + + 54039
TGACGTCAACCCGCTCTACACATCCGTAAGCTCCAGGCGTCCAAACGGTTCGGCGCGCGC
C SWARCVGI RGPQVCQAARD-
ACCGGCATACTCTCTGGCACAACTGGAATGAGTAGCGTGGCAGGCCACGGGGACCGGGCC
54040 + + + + + + 54099
TGGCCGTATGAGAGACCGTGTTGACCTTACTCATCGCACCGTCCGGTGCCCCTGGCCCGG
RHTLWHNWNE* ( ORF3 3 )
GGGCCAGGAACCTTCGTCCTCCATCTATTCGCTGGGGCGTGCACGTGTTGGAGCAGCCAT
54100 + + + + + + 54159
CCCGGTCCTTGGAAGCAGGAGGTAGATAAGCGACCCCGCACGTGCACAACCTCGTCGGTA
CTTTCGGCCGTCGCCTGAGGCAGCTGAGGACCGAGCGGGGTCTTTCCCAGGCCGCGCTCG
54160 + + + + + + 54219
GAAAGCCGGCAGCGGACTCCGTCGACTCCTGGCTCGCCCCAGAAAGGGTCCGGCGCGAGC
CGGGGGACGGCATGTCTACGGGCTATCTCTCGCGCCTGGAGTCGGGCGCCCGGCAGCCCT
54220 + + + + + + 54279
GCCCCCTGCCGTACAGATGCCCGATAGAGAGCGCGGACCTCAGCCCGCGGGCCGTCGGGA
(ORF34) MSTGYLSRLESGARQPS-
CCGATCGCGCCGTCGCCCACCTGGCCGGACAACTCGGCATCAGCCCGTCGGAGTTCGAAG
54280 + + + + + + 54339
GGCTAGCGCGGCAGCGGGTGGACCGGCCTGTTGAGCCGTAGTCGGGCAGCCTCAAGCTTC
DRAVAHLAGQLGI SPSEFEG-
GGTCCCGGGCCACCTCGCTCGCCCAGATCCTCTCCCTCTCCACTTCCCTGGAGTCCGACG
54340 + + + + + + 54399
CCAGGGCCCGGTGGAGCGAGCGGGTCTAGGAGAGGGAGAGGTGAAGGGACCTCAGGCTGC
SRATSLAQILSLSTSLESDE-
AGACCAGTGAGCTTCTCGCCGAGGCGGTACGTTCCGCGCATGGCCAGGATCCGATGCTCC
54400 + + + + + + 54459
TCTGGTCACTCGAAGAGCGGCTCCGCCATGCAAGGCGCGTACCGGTCCTAGGCTACGAGG
TS ELLAEAVRSAHGQ D PMLR-
GCTGGCAGGCCCTGTGGCTGCTGGGACAGTGGAAGCGCCGGCACGGCGACTCGGCCGGCG
54460 + + + + + + 54519
CGACCGTCCGGGACACCGACGACCCTGTCACCTTCGCGGCCGTGCCGCTGAGCCGGCCGC
WQALWLLGQWKRRHGD SAGE-
AGCACGGCTACCTCCAGCGTCTGGTGACGCTGAGTGAGGAGATCGGCCTGGCCGAGTTGC
54520 + + + + + + 54579
TCGTGCCGATGGAGGTCGCAGACCACTGCGACTCACTCCTCTAGCCGGACCGGCTCAACG
HGYLQRLVTLSEE I GLAELR-
GCGCACGGGCCCTGACCCAGTTCGCCCGGTCGCTGCGGGTACTGGGCGAGATCGTTCCGG
68
54580 + + + + + + 54639
CGCGTGCCCGGGACTGGGTCAAGCGGGCCAGCGACGCCCATGACCCGCTCTAGCAAGGCC
34 ARALTQFARS LRVLGE I V P A -
CGGTGGAGGCTGCCGCCGCCGCCCACCGGCTCGCGGTGGACCATGCGCTGTCCAGCCAGG
54640 + + + + + + 54699
GCCACCTCCGACGGCGGCGGCGGGTGGCCGAGCGCCACCTGGTACGCGACAGGTCGGTCC
34 VEAAAAAHRLAVDHALSSQD-
ACAGGGCCGCTTCGCTGCTGGTTCTGGTGTCGGTGGAGGCCGAGGCGGGACGGATGCCCG
54700 + + + + + + 54759
TGTCCCGGCGAAGCGACGACCAAGACCACAGCCACCTCCGGCTCCGCCCTGCCTACGGGC
34 RAASLLVLVSVEAEAGRMPD-
ACGCCCGGCGCCACGCCGACGAACTGACCGTCCTGGTGAGGGGACGGTCCGACACTCTGT
54760 + + + + + + 54819
TGCGGGCCGCGGTGCGGCTGCTTGACTGGCAGGACCACTCCCCTGCCAGGCTGTGAGACA
34 ARRHADELTVLVRGRSDTLW-
GGGCCGAGGCGTTGTGGACGGCGGGTGCGTTGAAGGTGCGGCAGGGCGAGTTCGCCGCGG
54820 + + + + + + 54879
CCCGGCTCCGCAACACCTGCCGCCCACGCAACTTCCACGCCGTCCCGCTCAAGCGGCGCC
34 AEALWTAGALKVRQGE F A A A -
CCGAGGTCCTTTTCCAGGAGGCTCTGGACGGGTTCGACAGCCGGGAGAACCTGACGATCT
54880 + + + + + + 54939
GGCTCCAGGAAAAGGTCCTCCGAGACCTGCCCAAGCTGTCGGCCCTCTTGGACTGCTAGA
34 EVLFQEALDGFDSRENLTIW-
GGCTGCGGCTGCGCATCGCGATGGCCGAACTCCACCTGCAGAAACTTCCTCCCGAGCCCG
54940 + + + + + + 54999
CCGACGCCGACGCGTAGCGCTACCGGCTTGAGGTGGACGTCTTTGAAGGAGGGCTCGGGC
34 LRLRIAMAELHLQKLPPEPD-
ACGCCGCGCAGCTCTGCATCGAGGCGGCGGAGGCGGCCCTTCCCTTTGCCCGCACATCCG
55000 + + + + + + 55059
TGCGGCGCGTCGAGACGTAGCTCCGCCGCCTCCGCCGGGAAGGGAAACGGGCGTGTAGGC
34 AAQLCIEAAEAALPFARTSA-
CTCTGGAACAGTCCCTCGCCGCTCTGCGGGCGCGCCTCGCCTTCCATGAGGGCAGGTTCG
55060 + + + + + + 55119
GAGACCTTGTCAGGGAGCGGCGAGACGCCCGCGCGGAGCGGAAGGTACTCCCGTCCAAGC
34 LEQS LAALRARLAFHEGRFA-
CCGATGCCCGCGCGTTGTTGGAGAGGCTCGGCAGGACCGAGCTCCGGCTGCCCTATCAGA
55120 + + + + + + 55179
GGCTACGGGCGCGCAACAACCTCTCCGAGCCGTCCTGGCTCGAGGCCGACGGGATAGTCT
34 DARALLERLGRTELRL P Y Q S -
GCCGGATCCGCCTGGAGGTCCTCGGTCATCAGCTGCGCATCCTGAGCGGGGAGGAGGAGG
55180 + + + + + + 55239
CGGCCTAGGCGGACCTCCAGGAGCCAGTAGTCGACGCGTAGGACTCGCCCCTCCTCCTCC
34 RIRLEVLGHQLRILSGEEEE-
AAGGCCTGGCCGGCCTCCAGCTCCTGGCCGAGGAGGCGCAGGAGAACTCCAACATCAACC
55240 + + + + + + 55299
TTCCGGACCGGCCGGAGGTCGAGGACCGGCTCCTCCGCGTCCTCTTGAGGTTGTAGTTGG
34 GLAGLQLLAEEAQENSNINL-
TCGCCGCGGAGATCTGGCGGCTCGCGGCGGAATGCCTGATGCGGGCGCGCGGGAAGGTCC
55300 + + + -f + + 55359
AGCGGCGCCTCTAGACCGCCGAGCGCCGCCTTACGGACTACGCCCGCGCGCCCTTCCAGG
34 AAE1WRLAAECLMRARGKVR-
GCGGCGCCACCGGCGGCTGACGCCGCGCCGGTTCGCGAGGTCCACCGCGCCGCCGTGGCC
55360 + + + + + + 55419
69
CGCCGCGGTGGCCGCCGACTGCGGCGCGGCCAAGCGCTCCAGGTGGCGCGGCGGCACCGG
34 G A T G G * (ORF34)
ACCGCCGTCGGCGTGAGGCGCCGGCGTGTGCCGCCCCCCACGGTTGCTCGCCCTTGGTGG
55420 + + + + + + 55479
TGGCGGCAGCCGCACTCCGCGGCCGCACACGGCGGGGGGTGCCAACGAGCGGGAACCACC
TGCATCTGTTGGCACATGTGTACCTCCTACACAGTCAATTGTTGCCAAAATTGTCGAACC
55480 + -f + + + + 55539
ACGTAGACAACCGTGTACACATGGAGGATGTGTCAGTTAACAACGGTTTTAACAGCTTGG
GAATGGCAATTGCTTGCCTTTGCTGAAGAGGCGTGCTGATATGCAAGTCAAGTAGCCTCC
55540 + + + + + + 55599
CTTACCGTTAACGAACGGAAACGACTTCTCCGCACGACTATACGTTCAGTTCATCGGAGG
TC CGAT CT CGGGCGGC CATATGGGAAACATCGAGTTGAGCGGCGATGGCGTTCGTCAGTG
55600 + + + + 4- + 55659
AGGCTAGAGCCCGCCGGTATACCCTTTGTAGCTCAACTCGCCGCTACCGCAAGCAGTCAC
CTGCCGTTCTGGCCAGGCAACTGATGTCGATGGGGATGGCAAGATTTTGCCGAAAACCGA
55660 + + + + + + 55719
GACGGCAAGACCGGTCCGTTGACTACAGCTACCCCTACCGTTCTAAAACGGCTTTTGGCT
TACATCTCTGTCCGTCCCGGACAGCCTTCGCCCCCCGGGTGACACTGCTCCGGCATGGCT
55720 + + + + + + 55779
ATGTAGAGACAGGCAGGGCCTGTCGGAAGCGGGGGGCCCACTGTGACGAGGCCGTACCGA
CCGGTTTCTCGTCGCCCGGCCGACGGACCGCACCGTCCGGAACGAGGCGCCGGTGTGCGT
55780 + + + + + + 55839
GGCCAAAGAGCAGCGGGCCGGCTGCCTGGCGTGGCAGGCCTTGCTCCGCGGCCACACGCA
CCGCTGATGGGCACAGCGGCCTCGGCCGCAGCAGGTTCCCACCGAGAAGAATGCCGAGGC
55840 + + + + + + 55899
GGCGACTACCCGTGTCGCCGGAGCCGGCGTCGTCCAAGGGTGGCTCTTCTTACGGCTCCG
CCAGCCGTGAACCACGACATGTCCCAGCGTGCCTTGCTGGAGGCGGCGGCCGAGGGGCTG
55900 + + + + + + 55959
GGTCGGCACTTGGTGCTGTACAGGGTCGCACGGAACGACCTCCGCCGCCGGCTCCCCGAC
CGGCGGCTGGCCGGCGACGCGCGGTGCCGGAGCGCGTCGGCCGCGCCCTCCTCGGCATTG
55960 + + + + + + 56019
GCCGCCGACCGGCCGCTGCGCGCCACGGCCTCGCGCAGCCGGCGCGGGAGGAGCCGTAAC
AGGGACATGTTCTCCCCCGCCGCCCGCCGGTACGTGCTCGCCTCGGACCGCGCGGGGTTC
56020 + + + + + + 56079
TCCCTGTACAAGAGGGGGCGGCGGGCGGCCATGCACGAGCGGAGCCTGGCGCGCCCCAAG
35 CORF35) MFSPAARRYVLASDRAGF
TTCGAGCAGGCTGTCCGGCTGCGCTCCCGGGGGTACCGGGTGAGCGCGGAGTTCGTCGGC
56080 + -f + + + + 56139
AAGCTCGTCCGACAGGCCGACGCGAGGGCCCCCATGGCCCACTCGCGCCTCAAGCAGCCG
35 FEQAVRLRSRGYRVSAEFVG
CCCGATCAGGGAGCCACCGACGCCCTCCACGCGGAGCACGTGGTCGAAGAGCACCTGAGG
56140 + + + + + + 56199
GGGCTAGTCCCTCGGTGGCTGCGGGAGGTGCGCCTCGTGCACCAGCTTCTCGTGGACTCC
35 PDQGATDALHAEHVVEEHLR
CTGCTCGATCAGGAGCCGGCCCCTGACCGGATCGGTGTGGACGTCTCCCGGATCGGCCTC
56200 + + + + + + 56259
GACGAGCTAGTCCT CGGCCGGGGACTGGC CTAGCCACAC CTGCAGAGGGCCTAGCCGGAG
35 LLDQEPAPDRIGVDVSRIGL
GCCCACTCGGCGCAGACTGCCCTGCGCAACACCGGGCGGCTGGCTGCCGCTGCGGCGCTC
56260 + + + + + + 56319
CGGGTGAGCCGCGTCTGACGGGACGCGTTGTGGCCCGCCGACCGACGGCGACGCCGCGAG
70
A
HSAQTALRNTGRLAAAAAL
CGCGGGAGCGAGGTCGTCCTGCTCATGGAGGGGTCCGAGGACATCGACACCGTGCTGGCC
56320 + + + + + + 56379
GCGCCCTCGCTCCAGCAGGACGAGTACCTCCCCAGGCTCCTGTAGCTGTGGCACGACCGG
RGSEVVLLMEGSEDIDTVLA
GTCCATGACGCCCTGGTGAACCGTTACGACAACGTGGGGATCACCCTTCAGGCGCACCTG
56380 + + + + + + 56439
CAGGTACTGCGGGACCACTTGGCAATGCTGTTGCACCCCTAGTGGGAAGTCCGCGTGGAC
VHDALVNRYDNVG I TLQAHL
CACCGCACCGTGGACGACGCCATGGCGGTCGCGGGTCCTGGCCGCACCGTGCGGCTGGTC
56440 + + + + + + 56499
GTGGCGTGGCACCTGCTGCGGTACCGCCAGCGCCCAGGACCGGCGTGGCACGCCGACCAG
H R
TVDDAMAVAGPGRTVRLV
ATGGGCTCCTCGGCCGAGCCTGCCGGCACCGCTCTGTCCCGGGGCCCCGCTCTGGAGGAC
56500 + + + + + + 56559
TACCCGAGGAGCCGGCTCGGACGGCCGTGGCGAGACAGGGCCCCGGGGCGAGACCTCCTG
MGSSAEPAGTALSRGPALED
CGGTACCTTGACCTCGCGGAGCTTCTCGTGGACCGTGGCGTCCGGCTGAGTCTGGCCACT
56560 + + + + + + 56619
GCCATGGAACTGGAGCGCCTCGAAGAGCACCTGGCACCGCAGGCCGACTCAGACCGGTGA
RYLDLAELLVDRGVRLSLAT
CCGGACGCCGAGGTCCTGGCCGGGGCGCAGGAGCGTGGTCTGCTCGAACGCGTCCAGGAC
56620 + + + + + + 56679
GGGCTGCGGCTCCAGGACCGGCCCCGCGTCCTCGCACCAGACGAGCTTGCGCAGGTCCTG
PDAEVLAGAQERGLLERVQD
ATCGAGATGCTCTACGGTGTGCGGCCCGAGCTGCTGCGCCGCCACCGGGCGGCGGGCCGC
56680 + + + + + + 56739
TAGCTCTACGAGATGCCACACGCCGGGCTCGACGACGCGGCGGTGGCCCGCCGCCCGGCG
IEMLYGVRPELLRRHRAAGR
CCCTGTCGCATCCACGCGGCCTACGGGATGAACTGGTGGCTTCCCCTGCTGCGGAGGCTG
56740 + + + + + + 56799
GGGACAGCGTAGGTGCGCCGGATGCCCTACTTGACCACCGAAGGGGACGACGCCTCCGAC
PCRIHAAYGMNWWLPLLRRL
GCCGACAACCCGCCGATGGTGCTCAACGCCCTGGCCGACATCGGCCGGGACCGGGAGCCC
56800 + + + + + + 56859
CGGCTGTTGGGCGGCTACCACGAGTTGCGGGACCGGCTGTAGCCGGCCCTGGCCCTCGGG
ADNPPMVLNALADIGRDREP
GTCGCCCACCAGGCGTACTGACCCGCCCCGGGCCGCGATCCGCGGGGCACCGGCCCCGGG
56860 + + + + + + 56919
CAGCGGGTGGTCCGCATGACTGGGCGGGGCCCGGCGCTAGGCGCCCCGTGGCCGGGGCCC
V A H Q A Y * (OF35)
GCGCCGGTCAGCTCCCGGTCGCCGCGAACTGCCCGGGCCTGCGCCCCTCGCCCGCCGGCC
56920 + + + + + + 56979
CGCGGCCAGTCGAGGGCCAGCGGCGCTTGACGGGCCCGGACGCGGGGAGCGGGCGGCCGG
(ORF36) * SGTAAFQGPRRGEGAP
CCCGGTAGGCCTGGGCGATGTCCAGCCACTTCTCCGCCTCCTGACCAGACGCGGTCAGGG
56980 + + + + + + "039
GGGCCATCCGGACCCGCTACAGGTCGGTGAAGAGGCGGAGGACTGGTCTGCGCCAGTCCC
; GRYAQAIDLWKEAEQGSATL
CGAGGTCGTCGCGGTGGCGGCGCCGGGTGACCAGCAGGCAGAAGTCGTGCGCGGGACCGC
57040 + + + + + + 57099
GCTCCAGCAGCGCCACCGCCGCGGCCCACTGGTCGTCCGTCTTCAGCACGCGCCCTGGCG
; ALDDRHRRRTVLLCFDHAPG
71
TGACCGTCTCGGTGGCGTCCTCGGGGCCGACCGTCCAGACCTCGCCCGAGGGGGCGGTGA
57100 + + + + + + 57159
ACTGGCAGAGCCACCGCAGGAGCCCCGGCTGGCAGGTCTGGAGCGGGCTCCCCCGCCACT
36 SVTETADEPGVTWVEGS PAT
GCTCGAAGCGGAACGGCGCGGCCGGCGGGGTCAGACCGTGGGACTCGTAGCCGAAGTCGC
57160 + + + + + + 57219
CGAGCTTCGCCTTGCCGCGCCGGCCGCCCCAGTCTGGCACCCTGAGCATCGGCTTCAGCG
36 LEPRFPAAPPTLGHSEYGFD
GTGTCAGCCAGGCGAAGTCGACGATGTTGCGAAGCCGCTCGGTGGGCGTGCGCCGGACAC
57220 + + + + + + 57279
CACAGTCGGTCCGCTTCAGCTGCTACAACGCTTCGGCGAGCCACCCGCACGCGGCCTGTG
36 RTLWAFDVINRLRETPTRRV
CCAGGGCGTCGGCGACGTCCTGGCCGTGGGCGAACACCTCCATGATCCCGGCGCAGCCCA
57280 + + + + + + 57339
GGTCCCGCAGCCGCTGCAGGACCGGCACCCGCTTGTGGAGGTACTAGGGCCGCGTCGGGT
36 GLADAVDQGHAFVEMI GACG
GAACGACCGGCGGCAGCGGGTTGACCAGCCACGGAACCACCTGGCCGGCGGGGACCGCGG
57340 + + + + + + 57399
CTTGCTGGCCGCCGTCGCCCAACTGGTCGGTGCCTTGGTGGACCGGCCGCCCCTGGCGCC
36 LVVPPLPNVLWPVVQGAPVA
CGAGCGCCTCGACCGAGGCCCGCCCCATGCCCCGGAAGCGGGTGAGCAGTTCCTGCGGCG
57400 + + + + + + 57459
GCTCGCGGAGCTGGCTCCGGGCGGGGTACGGGGCCTTCGCCCACTCGTCAAGGACGCCGC
36 ALAEVSARGMGRFRTLLEQP
GGAAGCCCTTGAACTGCTGCAGAGCCGCGTTGACCGCTCCGTCGAAGTTGCCTGCCGCGG
57460 + + + + + + 57519
CCTTCGGGAACTTGACGACGTCTCGGCGCAACTGGCGAGGCAGCTTCAACGGACGGCGCC
36 PFGKFQQLAANVAGD FNGAA
CGGCCGTGACGGCCTTGAACTCCTCCGGCGCCGCCGCCGCGGTCCTGGCCAGGTTGAAGA
57520 + + + + + + 57579
GCCGGCACTGCCGGAACTTGAGGAGGCCGCGGCGGCGGCGCCAGGACCGGTCCAACTTCT
36 AATVAKFEE PAAAATRALNF
CGAAGGTGAGGTGGGCGATCTGGTCGGTGACGGTCCAGCCGGGCGCCGGCGTCGGAGTGT
57580 + + + + + + 57639
GCTTCCACTCCACCCGCTAGACCAGCCACTGCCAGGTCGGCCCGCGGCCGCAGCCTCACA
36 VFTLHAIQDTVTWGPAPTPT
TCCAGGCTTCGTCGTCGATCTTCTCGACCAGCTGCGCCAGCTCCTCGATGTCGGTGGCCA
57640 + + + + + + 57699
AGGTCCGAAGCAGCAGCTAGAAGAGCTGGTCGACGCGGTCGAGGAGCTACAGCCACCGGT
36 NWAEDDIKEVLQALEEIDTA
GGTGCTTGAGGACGTCGTCGAGCGAATTCATCTCGTACTTCCTTCACTGGGGGTGTTCCG
57700 + + + + + + 57759
C CACG7VACTCCTGCAGCAGCTCGCTTAAGTAGAGCATGAAGGAAGTGACC CCCACAAGGC
36 LHKLVDDLSNM (ORF36)
GGCTGGGACGGATGTCCCGCCGGGTGGGCCGGCGGCCGGCGGAAGCGCCGTCGCGGAGCG
57760 + + + + + + 57819
CCGACCCTGCCTACAGGGCGGCCCACCCGGCCGCCGGCCGCCTTCGCGGCAGCGCCTCGC
TCGGCGACAGTCGCTAGGCGGCGCGTCCCGCGTAGGAGCCGGCCCGGTCGGAATAGGGCG
57820 + + + + + + 57879
AGCCGCTGTCAGCGATCCGCCGCGCAGGGCGCATCCTCGGCCGGGCCAGCCTTATCCCGC
37 (ORF37) *AARGAYSGARDSYP
CGAGCGCCTCGGCCAGGGCTTCGGGTATCAGGGTCGGCACGGTCGCCGTGTTGGGGCCGC
72
57880 + + + + + + 57939
GCTCGCGGAGCCGGTCCCGAAGCCCATAGTCCCAGCCGTGCCAGCGGCACAACCCCGGCG
37 ALAEALAEPILTPVTATNPG
GCATGCAGGCGATGCGCTGGCGTCCCCGCGCCACCAGGGTCTCGCCGCCGTCGTCGCCCA
57940 + + + + + + 57999
CGTACGTCCGCTACGCGACCGCAGGGGCGCGGTGGTCCCAGAGCGGCGGCAGCAGCGGGT
37 RMCAIRQRGRAVLTEGGDDG
GCTTGATGTAGTCGAAGGTGAACTCCAGCTGGGTCTGCCGCAGCTCCGAGAGCCTCATCC
58000 + + + + + + 58059
CGAACTACATCAGCTTCCACTTGAGGTCGACCCAGACGGCGTCGAGGCTCTCGGAGTAGG
37 LKIYDFTFELQTQRLESLRM
GGATCGACAGTTCGTCGAAGGCGGTGATCTCCGCGAAGAACTCGCAGTCCACCTTGAGGG
58060 + + + + + + 58119
CCTAGCTGTCAAGCAGCTTCCGCCACTAGAGGCGCTTCTTGAGCGTCAGGTGGAACTCCC
37 RISLEDFATIEAFFECDVKL
TGAAGAGCTTGAGGTCCTCCTGGACCTCGGCGAGCACCGAAGGCGCCCTCTCCTTGAGAA
58120 + + + + + + 58179
ACTTCTCGAACTCCAGGAGGACCTGGAGCCGCTCGTGGCTTCCGCGGGAGAGGAACTCTT
37 TFLKLDEQVEALVSPAREKL
AGAGTTCCCGGCAACGCCCCTGCCAACGAAGGTAGTTGACGTAGTAGACGTTGCCGACGA
58180 + + + + + + 58239
TCTCAAGGGCCGTTGCGGGGACGGTTGCTTCCATCAACTGCATCATCTGCAACGGCTGCT
37 FLERCRGQWRLYNVYYVNGV
GGTTCGTCTCCTCGAAGCCGACGGTGTGGCGGAGCTCGAAGTAGTCAGGATTCGTCGCGG
58240 + + --- + + + + 58299
CCAAGCAGAGGAGCTTCGGCTGCCACACCGCCTCGAGCTTCATCAGTCCTAAGCAGCGCC
37 LNTEEFGVTHRLEFYDPNTA
TCATAGGTCTGTGCCCTTCGTCGTCGGGGCCGGTCGTCGCACCGAGTTGCGTGAAGCAAC
58300 + + + + + + 58359
AGTATCCAGACACGGGAAGCAGCAGCCCCGGCCAGCAGCGTGGCTCAACGCACTTCGTTG
37 T M (ORF37)
TCACTGGTCGCGATGGCCTGCGGGGTCGGTGGCCCGCGCTCCGGGCGGAGAGTGCGGGCG
58360 + + + + + + 58419
AGTGACCAGCGCTACCGGACGCCCCAGCCACCGGGCGCGAGGCCCGCCTCTCACGCCCGC
GGGTGCCGGCCGGCGCGGGGTCAGCCGCGCGCCGACGGCAGCAGGGGAAGAACCCTCTCG
58420 + + + + + + 58479
CCCACGGCCGGCCGCGCCCCAGTCGGCGCGCGGCTGCCGTCGTCCCCTTCTTGGGAGAGC
38 (ORF38) *GRAS PLLPLVRE
CGGCCGCTCGTGGAGCCGTCGGGGGCCGGTGCGCCGTAGGTGACGGAGATACCCCGGCTC
58480 + + + + + + 58539
GCCGGCGAGCACCTCGGCAGCCCCCGGCCACGCGGCATCCACTGCCTCTATGGGGCCGAG
38 RGSTSGDPAPAGYTVS I GRS
TGCGCGGCGCGCACGATCCCCGGCATCGCGCGTTCGGCGAGCGCCGCGATGGTCATCGCG
58540 + + + + + + 58599
ACGCGCCGCGCGTGCTAGGGGCCGTAGCGCGCAAGCCGCTCGCGGCGCTACCAGTAGCGC
38 QAARVIGPMAREALAAITMA-
GGATTGACCGTCAGCGCGCCGGGAACCGACGATCCGTCGGTGACGAAGATCCCCGGGTGG
58600 + + + + + + 58659
CCTAACTGGCAGTCGCGCGGCCCTTGGCTGCTAGGCAGCCACTGCTTCTAGGGGCCCACC
38 PNVTLAGPVSSGDTVFIGPH
TCGCGGAGCTCGTTGCTGTCGTCCAGGGCGGATGTGTGGGGGTCGTCGCCCATCCGGCAG
58660 + + + + + + 58719
AGCGCCTCGAGCAACGACAGCAGGTCCCGCCTACACACCCCCAGCAGCGGGTAGGCCGTC
73
38 DRLENSDDLASTHPDDGMRC
GAGGAGAGCGGGTGGACGGTGTAGGCGCCGACGAGGTCGTTGGTCCAGGGCATGACCTTG
58720 + + + + + + 58779
CTCCTCTCGCCCACCTGCCACATCCGCGGCTGCTCCAGCAACCAGGTCCCGTACTGGAAC
38 SSLPHVTYAGVLDNTWPMVK-
GCCAGGCCGTCCTTCTCCAGGATCTCCTTGACCTCGGCGTCGGATGCGGCCCAGGCGCCC
58780 + + + + + + 58839
CGGTCCGGCAGGAAGAGGTCCTAGAGGAACTGGAGCCGCAGCCTACGCCGGGTCCGCGGG
38 ALGDKEL I EKVEADSAAWAG
AGGGTGTTCTTCGTCGGGTCGTAGCGCAGGTTGCCCCGGCCGAGCATCTGCTGGGAGATG
58840 + + + + + + 58899
TCCCACAAGAAGCAGCCCAGCATCGCGTCCAACGGGGCCGGCTCGTAGACGACCCTCTAC
38 LTNKTPDYRLNGRGLMQQS I
CGGTGGGCGTTACCGGTGGCGGGAGGGGGGCCGAAGACGCCTTCGTTGTCGTCCTCGATC
58900 + + + + + + 58959
GCCACCCGCAATGGCCACCGCCCTCCCCCCGGCTTCTGCGGAAGCAACAGCAGGAGCTAG
38 RHANGTAPPPGFVGENDDEI-
ATCGTGAAGATCGTGAGCCAGGAGGTCCACTGCTTCAGGATCTCCTTCTTCTCCTTGCCG
58960 + + + + + + 59019
TAGCACTTCTAGCACT CGGTCCTC CAGGTGACGAAGTCCTAGAGGAAGAAGAGGAACGGC
38 MTFITLWSTWQKLIEKKEKG-
AACCAGGAGGGGCCCGTGGCGCCGGGCACCTGGGCGAGGATCGTGCCGAGGCCCGGCGGG
59020 + + + + + + 59079
TTGGTCCTCCCCGGGCACCGCGGCCCGTGGACCCGCTCCTAGCACGGCTCCGGGCCGCCC
38 FWSPGTAGPVQALITGLGPP
AAGTAGAGCTGTTCCAGGGAGTAGCGGGAGTACTCGGGCAACGAGCCGTCCAGCCTGTCC
59080 + + + + + + 59139
TTCATCTCGACAAGGTCCCTCATCGCCCTCATGAGCCCGTTGCTCGGCAGGTCGGACAGG
38 FYLQELSYRSYEPLSGDLRD-
CAGCTCGCCACGGTGGGCCCCTTGCCGATCTGGTTGGCCGCGTAGGCGAGCCCGTCGCCC
59140 + + + + + + 59199
GTCGAGCGGTGCCACCCGGGGAACGGCTAGACCAACCGGCGCATCCGCTCGGGCAGCGGG
38 WSAVTPGKGIQNAAYALGDG-
CGGTCCAGGCCGAACAGCTCGGCCGCCTTGGCCTCGTCGATGATGGCGGTGTTGAGCCGC
59200 + + + + + + 59259
GCCAGGTCCGGCTTGTCGAGCCGGCGGAACCGGAGCAGCTACTACCGCCACAACTCGGCG
38 RDLGFLEAAKAEDI IATNLR-
TCGCCGTTGCCGGAGAAGTAGCGTCCGACCGCTCGTGGCATGGTGCCCAGGTGGGCCTCG
59260 + + + + + + 59319
AGCGGCAACGGCCTCTTCATCGCAGGCTGGCGAGCACCGTACCACGGGTCCACCCGGAGC
38 EGNGS FYRGVARPMTGLHAE
CTGCGCTGGAGGATCACCGGGGTCGCGCCCGCGCCGGCCGCCATCACCACGATCTTCGCC
59320 + + + + + + 59379
GACGCGACCTCCTAGTGGCCCCAGCGCGGGCGCGGCCGGCGGTAGTGGTGCTAGAAGCGG
38 SRQL IVPTAGAGAAMVVI KA-
TCGATGACGCCGCTGCCCGCCTGGAGGCGGTAGTCGTCGTCGTGCACGACGTTGTAGTGC
59380 + + + + + + 59439
AGCTACTGCGGCGACGGGCGGACCTCCGCCATCAGCAGCAGCACGTGCTGCAACATCACG
38 E IVGS GAQLRYDDDHVVNYH
ACCCGGTAGGAGCCGTCGGGGGTGCGCGAGAGGTGCTGGACCTCGTGCAGCGGGCGGATG
59440 + + + + + + 59499
TGGGCCATCCTCGGCAGCCCCCACGCGCTCTCCACGACCTGGAGCACGTCGCCCGCCTAC
38 VRYSGDPTRSLHQVEHLPRI-
74
CGCGCCCCATGGGCGATGGCGGCGGGCAGGTAGTTGACCAGCAAGGACTGCTTGGCCTCG
59500 + + + + + + 59559
GCGCGGGGTACCCGCTACCGCCGCCCGTCCATCAACTGGTCGTTCCTGACGAACCGGAGC
38 RAGHAIAAPLYNVLLSQKAE
AAGCGGCAGCCGGCCATCATCCAGTTGCAGTTCACGCACTTGGTGTTGTCGATGGCGACG
59560 + + + + + + 59619
TTCGCCGTCGGCCGGTAGTAGGTCAACGTCAAGTGCGTGAACCACAACAGCTACCGCTGC
38 FRCGAMMWNCNVCKTNDIAV-
GCGAGGGGGTTGGCGGTGCGGCCGGCGTGGTTGCACGCCGCGGCCCACAGTCCGCCGGCG
59620 + + + + + + 59679
CGCTCCCCCAACCGCCACGCCGGCCGCACCAACGTGCGGCGCCGGGTGTCAGGCGGCCGC
38 ALPNATRGAHNCAAAWLGGA-
TAGCTCACGTCGTTCCAGTCCTGCCGGGTCACGGAGAGGGACTCCTCGACACGGTCGTAC
59680 + + + + + + 59739
ATCGAGTGCAGCAAGGTCAGGACGGCCCAGTGCCTCTCCCTGAGGAGCTGTGCCAGCATG
38 YSVDNWDQRTVSLSEEVRDY-
CAGGGGTCCAGGGTTTCGCGGCTCACCGCCTGCGGCCACATCCGGCGTCCTATGGACCCC
59740 + + + + + + 59799
GTCCCCAGGTCCCAAAGCGCCGAGTGGCGGACGCCGGTGTAGGCCGCAGGATACCTGGGG
38 WPDLTERSVAQPWMRRGI SG
TGCCGGTCGAAGACGAAGCGCGGGGCGCGGGGCATCGCGGCGAAGTAGACGACGCTGCCG
59800 + + + + + + 59859
ACGGCCAGCTTCTGCTTCGCGCCCCGCGCCCCGTAGCGCCGCTTCATCTGCTGCGACGGC
38 QRDFVFRPARPMAAFYVVSG-
CCGCCCACACAGTTCCCGCCGAGGATGCTCATGCCGTCCCCGACCGTGAAGTCGAACGCC
59860 + + + + + + 59919
GGCGGGTGTGTCAAGGGCGGCTCCTACGAGTACGGCAGGGGCTGGCACTTCAGCTTGCGG
38 GGVCNGGLI SMGDGVTFDFA-
CTCGTGTACGAGGAGCCGAGTTTGTAGTCGTGCTCGAACTCCTTGCTCTCCAGCCACGGC
59920 + + + + + + 59979
GAGCACATGCTCCTCGGCTCAAACATCAGCACGAGCTTGAGGAACGAGAGGTCGGTGCCG
38 RTYSSGLKYDHEFEKSELWP
CCGCGTTCCAGGACGGTGACGTCGGCGCCCCCCGCCGCCAGGTGGTAGGCGGCGATGGCA
59980 + + + + + + 60039
GGCGCAAGGTCCTGCCACTGCAGCCGCGGGGGGCGGCGGTCCACCATCCGCCGCTACCGT
38 GRELVTVDAGGAALHYAAIA-
CCGCCGAATCCGCTGCCGATGACGAGGACGTCCGTGCGCTCGGCCGTGGTGCTCATGCGG
60040 + + + + + + 60099
GGCGGCTTAGGCGACGGCTACTGCTCCTGCAGGCACGCGAGCCGGCACCACGAGTACGCC
38 GGFGSGIVLVDTREATTSM
(ORF3 9) * A
GGCTCCCGGTGGACGTGGTGTCGGGGTGGAGGCGGGCGAACTCACGCCCGTAGCTGTAAT
60100 + + + + + + 60159
CCGAGGGCCACCTGCACCACAGCCCCACCTCCGCCCGCTTGAGTGCGGGCATCGACATTA
39 PSGTSTTDPHLRAFERGYSY
CCTTGAAGCGCCACAGGCCGTCGGCGTCCGGCATGCTCAGGCCCATGGCCTCCAGTCCCG
60160 + + + + + + 60219
GGAACTTCGCGGTGTCCGGCAGCCGCAGGCCGTACGAGTCCGGGTACCGGAGGTCAGGGC
39 DKFRWLGDADPMSLGMAELG
GATGGCCGTCCTCCATCGCCTGTGCCGTGTTGAGGTGCGCGGCCGAATCGAAGGCCATGT
60220 + + + + + + 60279
CTACCGGCAGGAGGTAGCGGACACGGCACAACTCCACGCGCCGGCTTAGCTTCCGGTACA
39 PHGDEMAQATNLHAASDFAM
75
TGCAGAAGAGGGACAGCAGCACCCAGAACTCCTTCTCGGGGTGGCCTGGTGTCGTCAGCC
60280 + + + + + + 60339
ACGTCTTCTCCCTGTCGTCGTGGGTCTTGAGGAAGAGCCCCACCGGACCACAGCAGTCGG
39 NCFLSIrLVWFEKEPHGPTTL
GCTGGATCAGCGCGGCCCGGTCCGGGTAGTCGAGCGCCACGAAGGGCGGGACCGTCGGGT
60340 + -f + + + + 60399
CGACCTAGTCGCGCCGGGCCAGGCCCATCAGCTCGCGGTGCTTCCCGCCCTGGCAGCCCA
39 RQILAARDPYDLAVFPPVTP
CGGGAGCCAGGCGGCGCTCCGCCGCGTAGGCCAGCGCGTGCTCGTTCACCAGGCGCACCA
60400 + + + + + + 60459
GCCCTCGGTCCGCCGCGAGGCGGCGCATCCGGTCGCGCACGAGCAAGTGGTCCGCGTGGT
39 DPALRREAAYALAHENVLRV
GGTCGTCCAGACCCTCGTGGATGCCGGTCGCATCCCATTGCAGGAGCTCCAGGGCTCCCG
60460 + + -f + + + 60519
CCAGCAGGTCTGGGAGCACCTACGGCCAGCGTAGGGTAACGTCCTCGAGGTCCCGAGGGC
39 LDDLGEHIGTADWQLLELAG
CCTGGACGGCGCCACCGCCGGTGGACACCCCCGCGATGGCCCGGTCGTCCGCGAAGCGCT
60520 + + + + + + 60579
GGACCTGCCGCGGTGGCGGCCACCTGTGGGGGCGCTACCGGGCCAGCAGGCGCTTCGCGA
39 AQVAGGGTSVGAIARDDAFR
TCTGGCCCGGCACGATCGTGTCCGCGTAGGCCTCCAGGGTCATGGTCCGGATATCGCCGG
60580 + + + + + + 60639
AGACCGGGCCGTGCTAGCACAGGCGCATCCGGAGGTCCCAGTACCAGGCCTATAGCGGCC
39 KQGPVITDAYAELTM (ORF3 9)
CCGGCGCCCCTCGCTCATTGTCGTCGCGCAACTCGCTCTCCATTCTCGCAGTCCGGAGTG
60640 + + + + + + 60699
GGCCGCGGGGAGCGAGTAACAGCAGCGCGTTGAGCGAGAGGTAAGAGCGTCAGGCCTCAC
GGATGCCTTGTGGCGAGGAGAAAGCTAGGTTCGTTCGACCGGTTCAAGCAACTAGCCAAA
60700 + + + + + + 60759
CCTACGGAACACCGCTCCTCTTTCGATCCAAGCAAGCTGGCCAAGTTCGTTGATCGGTTT
GTCGAGGCGACCTTGAAACCGACTCCACGGAGTTGGCGCGAAGCGGCGGATGGATTACAC
60760 + + + + + + 60819
CAGCTCCGCTGGAACTTTGGCTGAGGTGCCTCAACCGCGCTTCGCCGCCTACCTAATGTG
GCGCGGGCGAGCGGCTCACTAGTCTGGCCGCACGGATGTCTTCATCACCTGCACGTGGAA
60820 + + + + + + 60879
CGCGCCCGCTCGCCGAGTGATCAGACCGGCGTGCCTACAGAAGTAGTGGACGTGCACCTT
AAGCTTCTGCACGGGCACCGCATGTGGAAGTGAGCCCTGGTCTCATGTCTTGGGGGAAAC
60880 + + + + + + 60939
TTCGAAGACGTGCCCGTGGCGTACACCTTCACTCGGGACCAGAGTACAGAACCCCCTTTG
GTGAAAAGTGACTCTGCCCAACGCGCCGTGGAGCGATCACGCCGTGTCGTACGGATCGAT
60940 + + + + + + 60999
CACTTTTCACTGAGACGGGTTGCGCGGCACCTCGCTAGTGCGGCACAGCATGCCTAGCTA
40 VKSDSAQRAVERSRRVVRID
(ORF40)
GAACTCATTCCCGCCGATTCCCCGCGCCTGAACGGAATCGATCGTTCCCATGTGCAGCGC
61000 + + + + + + 61059
CTTGAGTAAGGGCGGCTAAGGGGCGCGGACTTGCCTTAGCTAGCAAGGGTACACGTCGCG
40 ELI PADS PRLNGIDRSHVQR
CTCGCGACCGTGTACGCGTCCCTGCCGCCGGTCCTGGTGCACCGCCCGACCATGCGGGTC
61060 + + + + + + 61119
GAGCGCTGGCACATGCGCAGGGACGGCGGCCAGGACCACGTGGCGGGCTGGTACGCCCAG
40 LATVYASLPPVLVHRPTMRV
76
GTCGACGGCATGCACCGCATCGGCGCGGCCCGCCTGAAGGGGCTGGACACGGTCGAGGTC
61120 + + + + + + 61179
CAGCTGCCGTACGTGGCGTAGCCGCGCCGGGCGGACTTCCCCGACCTGTGCCAGCTCCAG
40 VDGMHRIGAARLKGLDTVEV
ACCTTCTTCGAGGGCGCCGAGGAGCAGGTGTTCCTGCGTTCCGTCGCGGCGAACATCACC
61180 + + + + + + 61239
TGGAAGAAGCTCCCGCGGCTCCTCGTCCACAAGGACGCAAGGCAGCGCCGCTTGTAGTGG
40 TFFEGAEEQVFLRSVAANIT
AACGGCCTGCCGTTGTCGGTGGCCGACCGCAAGACCGCCGCGGCCCGCATTCTGGCCTCC
61240 + + + + + + 61299
TTGCCGGACGGCAACAGCCACCGGCTGGCGTTCTGGCGGCGCCGGGCGTAAGACCGGAGG
40 NGLPLSVADRKTAAARI LAS
CACCCGACCCTGTCCGACCGCGCGGTCGCCGCACACGTCGGCCTCGACGCCAAGACCGTG
61300 + + + + + + 61359
GTGGGCTGGGACAGGCTGGCGCGCCAGCGGCGTGTGCAGCCGGAGCTGCGGTTCTGGCAC
40 HPTLSDRAVAAHVGLDAKTV
GCGGGGGTACGGACGTGTTCAGCCGCGGGTTCTCCGCTGCTGAACATGCGCACCGGGGCG
61360 + + + + + + 61419
CGCCCCCATGCCTGCACAAGTCGGCGCCCAAGAGGCGACGACTTGTACGCGTGGCCCCGC
40 AGVRTCSAAGS PLLNMRTGA
GACGGCCGCGTCCACCCGTTGGACCGCACCGCCGAACGCCTGCACGCGGCCGCGCTGCTG
61420 + + + + + + 61479
CTGCCGGCGCAGGTGGGCAACCTGGCGTGGCGGCTTGCGGACGTGCGCCGGCGCGACGAC
40 DGRVHPLDRTAERLHAAALL
ACCCAGGACCCGGGACTCCCGTTGCGCTCCGTCGTCGAGCAGACGGGGCTGTCGCTGGGC
61480 + + + + + + 61539
TGGGTCCTGGGCCCTGAGGGCAACGCGAGGCAGCAGCTCGTCTGCCCCGACAGCGACCCG
40 TQDPGLPLRSVVEQTGLSLG
ACGGCCCACGACGTCCGCCGTCGGCTGCTGCGGGGCGAGGACCCGGTCCCGCAGAACCGG
61540 + + + + + + 61599
TGCCGGGTGCTGCAGGCGGCAGCCGACGACGCCCCGCTCCTGGGCCAGGGCGTCTTGGCC
40 TAHDVRRRLLRGED PVPQNR
CAGAGCGCGATGCTGGAGCCGGGACTCGCCCCGCAGAAGAAGGCGACGGCCAAGCCGCCC
61600 + + + + + + 61659
GTCTCGCGCTACGACCTCGGCCCTGAGCGGGGCGTCTTCTTCCGCTGCCGGTTCGGCGGG
40 QSAMLEPGLAPQKKATAKPP
GTCGGCCCGGCCGCCCGTCCGGTCCCGAAGGTGCCGCCCGCCGTCGCCGGCAGGCCGCCG
61660 + + + + + + 61719
CAGCCGGGCCGGCGGGCAGGCCAGGGCTTCCACGGCGGGCGGCAGCGGCCGTCCGGCGGC
40 VGPAARPVPKVPPAVAGRPP
GTGTCACCGCGGTCCCGGGCCCCGCTGGAGGCGCTGCGCAAGCTCTCCAACGACCCCTCC
61720 + + + + + + 61779
CACAGTGGCGCCAGGGCCCGGGGCGACCTCCGCGACGCGTTCGAGAGGTTGCTGGGGAGG
40 VSPRSRAPLEALRKLSNDPS
CTGCGCCACTCCGACCAGGGGCGCGAACTCATGCGCTGGCTGCACAACCGGTTCGTCGTC
61780 + + + + + + 61839
GACGCGGTGAGGCTGGTCCCCGCGCTTGAGTACGCGACCGACGTGTTGGCCAAGCAGCAG
40 LRHSDQGRELMRWLHNRFVV
GACGAGGCGTGGCGCCGGCGCGCGGACGCGGTCCCGGCCCACTGCGTCGACTCGATGGCG
61840 + + + + + + 61899
CTGCTCCGCACCGCGGCCGCGCGCCTGCGCCAGGGCCGGGTGACGCAGCTGAGCTACCGC
40 DEAWRRRADAVPAHCVDSMA
77
GAGCTGGCGCAGCACTGCTCGGACGCCTGGCACCGGTTCGCCGAGGAGATGGTTCGGCGC
61900 + + + + + + 61959
CTCGACCGCGTCGTGACGAGCCTGCGGACCGTGGCCAAGCGGCTCCTCTACCAAGCCGCG
40 ELAQHCSDAWHRFAEEMVRR
CGGCACAGCGCCGCGGCCGACGGCTCCGGACTCCGCACGACTCAGCCAACTCGCCGTTGA
61960 + + + + + + 62019
GCCGTGTCGCGGCGCCGGCTGCCGAGGCCTGAGGCGTGCTGAGTCGGTTGAGCGGCAACT
40 RHSAAADGSGLRTTQPTRR*
(ORF40) -
CGGCCTACTTCGACAGGGAGTTACGGTGACCACGAACACCATCGAGGACGCGGTCCGCCG
62020 + + + + + + 62079
GCCGGATGAAGCTGTCCCTCAATGCCACTGGTGCTTGTGGTAGCTCCTGCGCCAGGCGGC
41 (ORF41) VTTNT I EDAVRR
GGTCGTCGAGTACATGCACGTCAACCTGGGTCAGAACCTCACGATCGATGACATGGCGCG
62080 + + + + + + 62139
CCAGCAGCTCATGTACGTGCAGTTGGACCCAGTCTTGGAGTGCTAGCTACTGTACCGCGC
41 VVEYMHVNLGQNLT I DDMAR
CACGGCGATGTTCAGCAAGTTCCATTTCACCCGCATCTTCCGCGAAGTCACCGGTACCTC
62140 + + + + + + 62199
GTGCCGCTACAAGTCGTTCAAGGTAAAGTGGGCGTAGAAGGCGCTTCAGTGGCCATGGAG
41 TAMFSKFHFTRI FREVTGTS
TCCCGGGCGTTTCCTGTCCGCCTTACGGATTCAGGAGGCCAAGAGACTTCTCGTGCACAC
62200 + + + + + + 62259
AGGGCCCGCAAAGGACAGGCGGAATGCCTAAGTCCTCCGGTTCTCTGAAGAGCACGTGTG
41 PGRFLSALRI QEAKRLLVHT
TGCACTCAGTGTGGCCGATATCAGCAGTCAGGTCGGCTACAGCAGTGTCGGTACTTTCAG
62260 + + + + + + 62319
ACGTGAGTCACACCGGCTATAGTCGTCAGTCCAGCCGATGTCGTCACAGCCATGAAAGTC
41 ALSVADISSQVGYSSVGTFS-
TTCTCGCTTCAAGGCCTGTGTGGGGCTTTCCCCGAGCGCCTATCGCGACTTCGGCGGGGT
62320 + + + + + + 62379
AAGAGCGAAGTTCCGGACACACCCCGAAAGGGGCTCGCGGATAGCGCTGAAGCCGCCCCA
41 SRFKACVGLSPSAYRDFGGV-
GCAGCCGGGTTTTCCCTCCGCCGCGGCCCGTCTCACTCCCACCGCGCACAATCCCTCCGT
62380 + + + + + + 62439
CGTCGGCCCAAAAGGGAGGCGGCGCCGGGCAGAGTGAGGGTGGCGCGTGTTAGGGAGGCA
41 QPGFPSAAARLTPTAHNPSV-
GCGCGGCCGCATTCACTCCGCCCCGGGTGACAGGCCCGGAAGGATCTTCGTGGGCCTGTT
62440 + + + + + + 62499
CGCGCCGGCGTAAGTGAGGCGGGGCCCACTGTCCGGGCCTTCCTAGAAGCACCCGGACAA
41 RGRIHSAPGDRPGRI FVGLF-
CCCCGGCAGGATGCGCCAGGGCCGCCCGGCGCGCTGGACCGTCATGGAGAGTCCCGGGGC
62500 + + + + + + 62559
GGGGCCGTCCTACGCGGTCCCGGCGGGCCGCGCGACCTGGCAGTACCTCTCAGGGCCCCG
41 PGRMRQGRPARWTVMES PGA-
CTTCGAGCTCCGGGACGTGCCCGTGGGCACCTGGCACATCCTGGTCCACTCCTTCCCCGC
62560 + + + + + + 62619
GAAGCTCGAGGCCCTGCACGGGCAC CCGTGGACCGTGTAGGAC CAGGTGAGGAAGGGGCG
41 FELRDVPVGTWHILVHSFPA-
CGGACACCGGCCGCACCAGCTCGACTCCGAACCGCTGTTGCTCGGGCACAGCGGACCGCT
62620 + + + + + + 62679
GCCTGTGGCCGGCGTGGTCGAGCTGAGGCTTGGCGACAACGAGCCCGTGTCGCCTGGCGA
41 GHRPHQLDSEPLLLGHSGPL-
78
CGTGGTGCACCCCGGTGCCCTGCTCCGGCCGGCGGACATCCTCCTGCGCGCGGTGGACGC
62680 + + + + + + 62739
GCACCACGTGGGGCCACGGGACGAGGCCGGCCGCCTGTAGGAGGACGCGCGCCACCTGCG
41 VVHPGALLRPADI LLRAVDA-
CCTCGATCCACCGGTCCTGCTGGCCCACTTCGCGCTGGAGAGCCGCCTCACCTCGCCGTA
62740 + + + + + + "799
GGAGCTAGGTGGCCAGGACGACCGGGTGAAGCGCGACCTCTCGGCGGAGTGGAGCGGCAT
41 LDPPVLLAHFALESRLTSPY-
42 (0RF42) * R A T -
CTCACCGTCATCGGTAGCCCTCCGCGCATCCGCAGGGAGAGCATGGGTTCGGCAACCGCC
62800 + + + + + + 62859
GAGTGGCAGTAGCCATCGGGAGGCGCGTAGGCGTCCCTCTCGTACCCAAGCCGTTGGCGG
41 SPSSVALRASAGRAWVRQPP-
42 SVTMPLGGRMRLSLMPEAVA-
CGGTGTCCGGCGACGGTACGCAGATCGAGATCGCGGGTGACCAGGGCCGTGACGAACACC
62860 + + + + + + 62919
GCCACAGGCCGCTGCCATGCGTCTAGCTCTAGCGCCCACTGGTCCCGGCACTGCTTGTGG
41 GVRRRYADRDRG* (ORF41)
42 RHGAVTRLDLDRTVLATVFV-
GCCTCCATCATCCCGAGGTTGCTGCCGACGCAGAACCGGGGCCCCGCGCCGAACGGGATG
62920 + + + + + + 62979
CGGAGGTAGTAGGGCTCCAACGACGGCTGCGTCTTGGCCCCGGGGCGCGGCTTGCCCTAC
42 AEMMGLNSGVCFRPGAGFPI-
TACGCGTACCGCGGCCGGTCGGCGGTCTGCCGGGGTTCGAACCGCTCGGGGTCGAAGCGC
62980 + + + + + + 63039
ATGCGCATGGCGCCGGCCAGCCGCCAGACGGCCCCAAGCTTGGCGAGCCCCAGCTTCGCG
42 YAYRPRDATQRPEFREPDFR-
TCGGGGTCCTCCCACAGCCCCGGATGGCGGTGCATGATGTACGGGCAGACCAGCACATCC
63040 + + + + + + 63099
AGCCCCAGGAGGGTGTCGGGGCCTACCGCCACGTACTACATGCCCGTCTGGTCGTGTAGG
42 EPDEWLGPHRHMIYPCVLVD-
42
GATCCGGCGGACACCGTGTAGCCGCCGACCACATCGCGTTGCTGGGCCACCCTGGGCAGG
+ + + + + +
CTAGGCCGCCTGTGGCACATCGGCGGCTGGTGTAGCGCAACGACCCGGTGGGACCCGTCC
SGASVTYGGVVDRQQAVRPL
ATCCC
63160 + 63164
TAGGG
42 I G -
79